The language-puppet website.

Work with your manifests!

Hspuppetmaster Alpha Release

Grab the alpha version of Hspuppetmaster (compiled for x64 Linux)! It will become a full-fledged replacement for the default puppetmaster, but is still not ready for prime-time. In order to use it, you must:

  • untar everything on your puppetmaster host

  • run it with ./hspuppetmaster /etc/puppet +RTS -N (the first argument is the location of your puppet repository) as the puppet user

  • modify your web server configuration to redirect requests for /production/catalog to 127.0.0.1:3000. This can be done in Apache by disabling the “High Performance mode” (wasted a few hours on this one), and adding something like that to your vhost configuration:

1
ProxyPass /production/catalog http://localhost:3000/production/catalog

There are quite a few things it will not do (such as updating the facts in PuppetDB), but you should experience much better catalog compilation times (I have right now catalogs that take more than the default two minutes timeout to compile with Puppet), sometimes much clearer error messages. As it is based on language-puppet, it is generally much more strict than Puppet. For example it will fail on any variable that cannot be resolved.

Please have a try and let me know how it worked for you (do use --noop!).

Non-strict IO Surprises

The language-puppet library has been created when I started to learn Haskell. As a consequence, it uses the dreaded String type to store all kind of textual values. It also uses the System.IO module for performing I/O. I was aware of the file descriptor leak problem that happens when you use readFile, so I chose for the following implementation for the Puppet file function:

1
2
3
file :: [String] -> IO (Maybe String)
file [] = return Nothing
file (x:xs) = catch (fmap Just (withFile x ReadMode hGetContents)) (\_ -> file xs)

This should return Just the content of the first readable file in the parameter list, or Nothing if there are none, and should not leak any file descriptor. Now that I am finalizing the hspuppetmaster binary, I can use my library to (try to) compute catalogs on my production systems, using the standard puppet agent -t --noop. It turned out that the file function was misbehaving. Testing it in GHCi illustrates the problem:

1
2
3
4
> file ["/nothing"]
Nothing
> file ["/nothing", "/etc/hosts"]
Just ""

It seems to work fine, except all file contents are empty. This behavior seems to be common knowledge among Haskellers, and is due to the fact that the file descriptor is closed before the output is evaluated. This is pretty horrible (and surprising), and what is even worse is my solution:

1
2
3
4
5
6
file (x:xs) = catch
    (fmap Just (withFile x ReadMode (\fh -> do
                                        { y <- hGetContents fh
                                        ; evaluate (length y)
                                        ; return y })))
    ((\_ -> file xs) :: SomeException -> IO (Maybe String))

It is a bit longer because of the use of the non-deprecated version of catch, and because it explicitly forces evaluation of the output of hGetContents. This behavior was extremely surprising to me, and I would like to thank the people on #haskell for their help in devising a correct version (mine was along the lines of !y <- hGetContents, which worked for my simple examples, but was certain to fail at some point). This is the only IRC channel I know of where people are at the same time active, always helpful, and knowledgeable.

Testing Whole Puppet Catalogs

The main goal of this project is, for now, to assist sysadmins editing their catalogs. The best illustration is, for now, the puppetresources application. It can:

  • Check a file syntax, and print what it thinks it is.
  • Compute a whole catalog and display it in human readable format or JSON.
  • Display details about a specific resource in a catalog, including special support for file contents (useful for debugging templates).
  • And do the two previous items using facts and/or queried data from a real PuppetDB.

It is also fast enough to compute the catalogs of all your nodes in reasonable time, which opens possibilities you would not even dream of in the Ruby Puppet world. One of them is writing “integration tests” that let you check properties related to complex environmental interactions between hosts.

In order to facilitate this, I am in the process of writing a fully fledged testing API (it is still a bit lacking). It is strongly inspired by other testing APIs and should quickly evolve into something that is very easy to use. It is not the current focus (which is to replace an actual Puppet Master with my software), but I already implemented a test that is built in the puppetresources executable: it now checks that each source parameter in each file resources points to an actual file. This is a common error pattern to me (forgetting to create the file, mistyping its name, or placing it in the wrong directory) that has now disappeared.

Oh by the way, a new version is out ! Version 0.3.2 mainly changes the license, from GPL3 to BSD3. The choice was dictated by the sudden outburst of horribly uninteresting posts about licensing that has plagued Haskell-cafe during the last few hours. I hope this will end soon, or it will not be possible to differentiate this mailing list from that of Debian.

Language-puppet v0.3.1 - JSon Catalogs

A new version is already out, this time with JSon catalogs generation. It is not properly tested, but Puppet seems to accept them. If someone knows how to get puppet catalog apply to download files from a Puppet server, I am interested.

I will probably write a sample application on top of WARP and modify the configuration of my Puppetmaster to redirect catalog requests towards it. This means that there could be an efficient replacement to the Puppetmaster soon.

Language-puppet v0.3.0 - Resource Relationships

This version introduces resource relationships handling. It is also full of nasty bugs :) An improved version is already in the works, along with great features.

First of all, you will now get notifications when a resource is missing or when you have created cycles. There are still some bugs :

  • The aliases are not taken into account.

  • The relationship metaparameters on classes are ignored.

  • With the released version, nothing is actually working. Sorry … I realized too late how broken it was. You might want to check github, or the updated binary packages.

This is the kind of error messages you will get when cycles are found :

1
2
3
4
5
6
7
8
puppetresources: The following cycles have been found:
  File[/a]
       -> File[/b] ["./manifests/site.pp" (line 557, column 9)] link is Just (RRequire,UNormal,"./manifests/site.pp" (line 556, column 9))
       -> File[/c] ["./manifests/site.pp" (line 558, column 9)] link is Just (RRequire,UNormal,"./manifests/site.pp" (line 557, column 9))
       -> File[/d] ["./manifests/site.pp" (line 559, column 9)] link is Just (RRequire,UNormal,"./manifests/site.pp" (line 558, column 9))
       -> File[/e] ["./manifests/site.pp" (line 560, column 9)] link is Just (RRequire,UNormal,"./manifests/site.pp" (line 559, column 9))
       -> File[/f] ["./manifests/site.pp" (line 561, column 9)] link is Just (RRequire,UNormal,"./manifests/site.pp" (line 560, column 9))
       -> File[/g] ["./manifests/site.pp" (line 562, column 9)] link is Just (RRequire,UNormal,"./manifests/site.pp" (line 561, column 9))

Please note how each resource and link position is displayed. This is a lot more verbose than what the vanilla Puppet spouts, but I believe it is also much more useful. Chasing after links defined as a resource chain gets old very fast.

This is the current error message when a relationship points to (or from) an unknown resource:

1
puppetresources: Unknown relation ("file","/b") -> ("file","/a") used at "./manifests/site.pp" (line 556, column 9) debug: (False,True,False,False)

This one is terrible, and will need to be reworked. There is still quite a bit of work, but I am fairly pleased at how everything seems to fold in place.

I started this project at the end of April, 7 months ago. The project is incredibly useful as it is right now, and I am confident I will be able to provide a robust and vastly more efficient puppetmaster, with a ton of helpful tools, before the first anniversary of this project.

Version 0.2.2 - Better Performances and Testing

A new version is out. Actually, a pair of versions have been released. The most important change is an important bug with how default values worked that has been fixed. The other visible improvements will be felt on the performance side, as discussed in the previous entry.

v0.2.2

  • New features
    • A few statistics are exported.

v0.2.1

  • Bugs fixed
    • The defaults system was pretty much broken, it should be better now.
  • New features
    • Basic testing framework started.
    • create_resources now supports the defaults system.
    • defined() function works for resource references.
    • in operator implemented for hashes.
    • Multithreading works.
    • The ruby <> daemon communication is now over ByteStrings.
    • The toRuby function has been optimized, doubling the overall speed for rendering complex catalogs.
    • Various internal changes.

Performance Improvements

One of the worst decision that was taken when designing language-puppet was to use the String type. It is well known that this type is horribly slow. I had a few tests to run and they were taking forever : it took about 11 seconds to compute 5 catalogs, which is annoying when you are doing this often. As a comparison, the puppetmaster on faster computer (it is a X5660, whereas my workstation uses a i7 860), running the latest Puppet, takes 23 seconds to compute those 5 catalogs sequentially.

A quick profiling revealed the following :

  • The program couldn’t use more than a CPU.
  • Almost all the time was spent computing templates.

I refactored the code a bit during lunch. The Daemon code was supposed to be multithreaded, and has been designed as such. The only problem was that I forgot to start several worker threads. This is now fixed.

The template computing time was reduced by spawning several threads for this task (see previous commit), and converting the String code to ByteString, using builders. The time is now mostly spent in the renderString function, defined as :

test.hs
1
2
renderString :: String -> BB.Builder
renderString x = let !y = BB.stringUtf8 (show x) in y

I went with the definition in here, but it is much slower. If somebody has a better implementation, please let me know.

The ByteString move reduced the time it took to compute the catalogs to about 6 seconds, and parallelisation reduced it to about 2 seconds. It is a 5.5x speed upgrade for a few minutes of work, not too bad. The template generation still takes most of the running time (50% of the time is spent spawning and waiting for the Erb code, 20% preparing the inputs for the Ruby processes). Nice speedups could arise from parsing more complex Ruby expressions from the Haskell code, but it is not a priority now that the performance is acceptable.

You can grab the latest github repo to test it, but you will need a very recent bytestring, and you will need to fix the cabal file for luautils.

Version 2.0.0 - the Lua Integration

Quick note on versionning. The official version number is 0.2.0, but I have been omitting the first zero on this blog. Anyway, this version includes two features, one of them built on the hslua library.

A huge bug was fixed: defaults finishing with a comma were interpreted as a function working on a hash! For example:

1
File { owner => 'root', }

Was seen as:

1
file({ 'owner' => 'root' })

Custom functions

Testing custom functions was almost impossible until now, as they required being compiled with the library to be used. It is now possible to include lua implementations of your functions alongside the ruby version.

It is a bit harder to write than the ruby versions as it can’t (for now) access anything from the “Puppet” side, such as facts, and you have to put up with the Lua syntax.

The other caveat is that it is stored right next to the ruby function right now, and Puppet tries to interpret it (and find syntax errors in it), so it isn’t very clean. I will move it for the next version, and will rewrite a few functions from Puppetlabs stdlib too.

From the implementation point of view, the difficult part was the fact that there were no instance for Data.Map. I wrote it and it is now part of the luautils package.

Custom types

As there was all the infrastructure to find files into subfolders, I also added a very weak custom type system. It will detect the file names in the usual places and will know these are valid Puppet types. It doesn’t perform any additional checks for now.

I might add a lua functionnality for fine grained verifications.

What’s next

The fabled dependency handling is not coming soon, but there is a few things that are already implemented and will be released next time:

  • Support for the defined function. Just like the real thing it is parse order dependant.

  • the create_resources function now accepts the third parameter (default (values).

  • The in operator now works with hashes. I am not sure why the Puppet stdlib implements has_key

Types Used in the Interpretation Stage

In order to celebrate the entry of this blog into planet haskell, I will talk a bit about the implementation of the interpretation stage of the program. I also ripped the eggplant color from the puppetlabs.com site.

Module structure

The whole interpretation process could be separated in the following parts (fitting the modules hierarchy):

  • The parser. This is an ugly part, is it is the first part I wrote when I had no clue about monads, applicative functors, or that the do notation was just syntactic sugar.

  • The interpreter, which will be discussed here. It is also an ugly part, but for other reasons.

  • The “daemon”, which is a piece of code that glues everything together and provides a simple API for using them. It should scale on multiple cores too, but this hasn’t been tested yet.

  • The PuppetDB communication facilities, that need to be reworked.

  • The native types support, with checks about the consistency of their parameters.

  • The plugin system, which currently handles user defined functions and types.

The catalog monad

Most functions in the catalog monad are of type CatalogMonad, which is just an error/state monad over IO :

1
type CatalogMonad = ErrorT String (StateT ScopeState IO)

The fact that IO is needed is unfortunate, but cannot really be avoided, as templates are computed and PuppetDB is queried during catalog evaluation. It would have been possible to return partially evaluated catalogs to an outer function that lives in IO and that would be responsible with handling this, but I thought it would just make things complicated.

Template computation is another matter. It is theorically possible to run it as a pure function (if you accept to only use a subset of the Ruby language), but it would require a full fledged Ruby interpreter. This is not an option here, and templates are computed by spawning a Ruby process every time. Now that a LUA interpreter is embarked, it would be possible to write lua versions of the templates and have them interpreted in the same process.

Achieving position independency with a single pass

The catalog computing function basically goes through the AST (following includes and defines) and tries to resolve everything it processes. The problem is that the Puppet language is supposed to be referentially transparent, especially according to the the language guide, which was the only documentation for quite a long time.

Puppet language is a declarative language, which means that its scoping and
assignment rules are somewhat different than a normal imperative language. The
primary difference is that you cannot change the value of a variable within a
single scope, because that would rely on order in the file to determine the
value of the variable. Order does not matter in a declarative language.

Reading this, I presumed that the position of a variable assignment, like everything else, did not matter. It turns out it does, but it is now a bit late to correct this.

In order to satisfy this false assumption, all data and resource types exist in two flavors. For example, with values we have Value and ResolvedValue. It would be a good idea to remove all this cruft right now, but I am not in a hurry to touch it, as it represents quite a bit of code and seems to work alright for now.

It presents several challenges, as values are supposed to be transformable into their resolved counterpart at any time, but many things are context dependant (such as variable scoping, local variable presence, variables in defines, etc.). It also doesn’t work at all (just like with Puppet) with control structures or templates that rely on not-yet-defined values.

If somebody knows what the “proper” way to do this is, I would be quite interested. I am almost certain this exists as it seems related to writing fast compilers, which is a subject that has certainly been explored by smart people.

Room for extension

Finally, the most important type, besides that of the state, is the type of the main function getCatalog. One can notice there are quite a few functions that must be passed along. The reason is that it should be possible to swap backends easily. An example that might be written shortly would be to give two implementations for the puppetDB function:

  • A generalization of the current version, that queries a real puppetDB.

  • An emulated puppetDB that is populated with other runs.

This would need passing around more functions (a function to query arbitrary values to start with, and a function to store the exported resources), but would be immensely useful. It would make it possible to write test suites that cover the whole node list, including exported resources between hosts.

Now that would be pretty cool, right ?