The language-puppet website.

Work with your manifests!

Types Used in the Interpretation Stage

In order to celebrate the entry of this blog into planet haskell, I will talk a bit about the implementation of the interpretation stage of the program. I also ripped the eggplant color from the site.

Module structure

The whole interpretation process could be separated in the following parts (fitting the modules hierarchy):

  • The parser. This is an ugly part, is it is the first part I wrote when I had no clue about monads, applicative functors, or that the do notation was just syntactic sugar.

  • The interpreter, which will be discussed here. It is also an ugly part, but for other reasons.

  • The “daemon”, which is a piece of code that glues everything together and provides a simple API for using them. It should scale on multiple cores too, but this hasn’t been tested yet.

  • The PuppetDB communication facilities, that need to be reworked.

  • The native types support, with checks about the consistency of their parameters.

  • The plugin system, which currently handles user defined functions and types.

The catalog monad

Most functions in the catalog monad are of type CatalogMonad, which is just an error/state monad over IO :

type CatalogMonad = ErrorT String (StateT ScopeState IO)

The fact that IO is needed is unfortunate, but cannot really be avoided, as templates are computed and PuppetDB is queried during catalog evaluation. It would have been possible to return partially evaluated catalogs to an outer function that lives in IO and that would be responsible with handling this, but I thought it would just make things complicated.

Template computation is another matter. It is theorically possible to run it as a pure function (if you accept to only use a subset of the Ruby language), but it would require a full fledged Ruby interpreter. This is not an option here, and templates are computed by spawning a Ruby process every time. Now that a LUA interpreter is embarked, it would be possible to write lua versions of the templates and have them interpreted in the same process.

Achieving position independency with a single pass

The catalog computing function basically goes through the AST (following includes and defines) and tries to resolve everything it processes. The problem is that the Puppet language is supposed to be referentially transparent, especially according to the the language guide, which was the only documentation for quite a long time.

Puppet language is a declarative language, which means that its scoping and
assignment rules are somewhat different than a normal imperative language. The
primary difference is that you cannot change the value of a variable within a
single scope, because that would rely on order in the file to determine the
value of the variable. Order does not matter in a declarative language.

Reading this, I presumed that the position of a variable assignment, like everything else, did not matter. It turns out it does, but it is now a bit late to correct this.

In order to satisfy this false assumption, all data and resource types exist in two flavors. For example, with values we have Value and ResolvedValue. It would be a good idea to remove all this cruft right now, but I am not in a hurry to touch it, as it represents quite a bit of code and seems to work alright for now.

It presents several challenges, as values are supposed to be transformable into their resolved counterpart at any time, but many things are context dependant (such as variable scoping, local variable presence, variables in defines, etc.). It also doesn’t work at all (just like with Puppet) with control structures or templates that rely on not-yet-defined values.

If somebody knows what the “proper” way to do this is, I would be quite interested. I am almost certain this exists as it seems related to writing fast compilers, which is a subject that has certainly been explored by smart people.

Room for extension

Finally, the most important type, besides that of the state, is the type of the main function getCatalog. One can notice there are quite a few functions that must be passed along. The reason is that it should be possible to swap backends easily. An example that might be written shortly would be to give two implementations for the puppetDB function:

  • A generalization of the current version, that queries a real puppetDB.

  • An emulated puppetDB that is populated with other runs.

This would need passing around more functions (a function to query arbitrary values to start with, and a function to store the exported resources), but would be immensely useful. It would make it possible to write test suites that cover the whole node list, including exported resources between hosts.

Now that would be pretty cool, right ?