I am working on a quick and dirty Ruby bridge library, that I hope will yield a huge performance gain with template interpolation in the language-puppet library. Right now, it is capable of:
Initializing a Ruby interpreter from libruby
Calling Ruby methods and functions
Registering methods or functions that will be called from Ruby code
Converting data between the two Worlds (right now the most complex instance is the JSON one, which means that many complex Ruby types can’t be converted, but it is more than enough for passing data)
Embedding native Haskell values that can be passed around in Ruby to the Haskell-provided external functions (I will use this for passing the Puppet catalog state around)
There are still a few things to do before releasing it :
Making compilation a bit less dependant on the system. This will probably require quite a few flags in the cabal definition …
Hunting for memory leaks. I am not sure how to do this with the GHC Runtime in the middle, and I do hope that ruby_finalize frees everything that is managed by the Ruby runtime. After all, restarting processes seems to be the only working garbage collection method for Ruby daemons …
Writing stubs for the Puppet library methods that might be needed by templates. I would like to be able to support custom types and functions directly written in Ruby instead of Lua, but this will probably turn into a nightmare …
{-# LANGUAGE OverloadedStrings, OverloadedStrings #-}moduleMainwhereimportForeign.Ruby.BindingsimportData.AesonimportData.Attoparsec.Number-- this is an external function that will be executed from the Ruby interpreter-- the first parameter to the function is probably some reference to some top object-- my knowledge of ruby is close to nonexistent, so I can't say for sure ...extfunc::RValue->RValue->IORValueextfunc_v=do-- deserialize the Ruby value into some JSON Valueonv<-fromRubyv::IO(MaybeValue)-- and display itprintonv-- now let's create a JSON object containing all kind of data typesletnv=object[("bigint",Number(I16518656116889898998656112323135664684684)),("int",Number(I12)),("double",Number(D0.123)),("null","Null"),("string",String"string"),("true",BoolTrue),("false",BoolFalse),("array",toJSON([1,2,3,4,5]::[Int])),("object",object[("k",String"v")])]-- turn it into Ruby values, and return thistoRubynv-- this is the function that is called if everything was loaded properlynextThings::IO()nextThings=do-- turn the extfunc function into something that can be called by the Ruby interpretermyfunc<-mkRegistered2extfunc-- and bind it to the global 'hsfunction' functionrb_define_global_function"hsfunction"myfunc1-- now call a method in the Ruby interpretero<-safeMethodCall"MyClass""testfunc"[]caseoofRightv->(fromRubyv::IO(MaybeValue))>>=printLeftr->putStrLnrmain::IO()main=do-- initialize stuffruby_initruby_init_loadpath-- and load "test.rb"s<-rb_load_protect"test.rb"0ifs==0thennextThingselseshowError>>=putStrLn
And here is the ruby program, that calls our external function :
I just released the latest language-puppet version. For the full list of changes, please take a look at the
changelog. Here are the highlights.
PuppetDB code reworked
The PuppetDB code and API has been completely overhauled. It is now more generic : the resource collection and puppet
query functions now work the same. Additionally, a PuppetDB stub has been created for testing use.
Better diagnostic facilities
As the main use of this library is to test stuff, the following features were added:
Several error messages have been reworked so that they are more informative.
A dumpvariables built-in function has been added. It just prints all known variables (and facts) to stdout, and can
be quite handy.
The “scope stack” description is stored with the resources. This turned out to be extremely useful when debugging
resource names colisions or to find out where some resource is defined.
Here is an example, let’s say you do not remember which package installs the collectd package. Just run this :
You now know exactly where the package resource is declared, and the list of “scopes” that have been traversed in
order to do so. Note that this information is displayed when resources names collide.
Easier to setup
This library doesn’t depend from a newish bytestring anymore, and should build with the package provided with a GHC
compiler of the 7.6.x serie.
This is not yet done, but I will certainly soon publish a debian-style repository of the compiled puppetresources binary. I am interested in suggestions for an automated building system.
Better testing
The testing API seems sufficient to write pretty strong tests, but would still benefit from a few more helper functions.
The testing “daemon” has been reworked to use the new PuppetDB stub. It makes it possible to test complex interactions
between hosts using the exported resource or PuppetDB query features.
Work in progress
I will probably lensify the code until I get a descent understanding of it.
I do not intend to work on Hiera emulation just yet, as I am probably the only user of this library for now and I do not
use this feature.
One area of improvement would be to embed the ruby interpreter in the library. I am not sure how to do this, but as
there are quite a few projects of lightweight interpreters sprouting from the earth, it might be possible in the near
future. The only problem would be figuring out how to build a large C project with cabal.
Some other considerations
I recently ported the code from random.c to Haskell
(here). This has been
quite tedious, and is quite hard to read. This is an almost naive port of the code found in the Ruby interpreter,
without the useless loop variables. For some reason, there are many loops like this :
12345678910
i=1; j=0;
k = (N>key_length ? N : key_length);
for (; k; k--) {
mt->state[i] = (mt->state[i] ^ ((mt->state[i-1] ^ (mt->state[i-1] >> 30)) * 1664525U))
+ init_key[j] + j; /* non linear */
mt->state[i] &= 0xffffffffU; /* for WORDSIZE > 32 machines */
i++; j++;
if (i>=N) { mt->state[0] = mt->state[N-1]; i=1; }
if (j>=key_length) j=0;
}
As you can see, the value of k is never used in the loop. I am not sure why the author didn’t go for something like :
1
for(i=1;i<k;i++) {
Anyway, the Haskell code is pretty bad, and will certainly only work for 64-bit builds. I am not sure how I should have
written it. I suppose staying in the ST monad would have lead to nicer code, and I am open to suggestions.
This blog post is not about language-puppet, but might be of interest to my fellow sysadmins with an interest in Haskell. I recently worked with Logstash in a way that might not be typical, as all my messages are emitted by services that are Logstash-aware: they directly write JSON messages to the TCP input of the Logstash server. This means that most of the features (and some would say, the whole point) of Logstash were of no use to me.
I stuck with my grand mission of rewriting the handful of useful Ruby programs, and wrote a new package. I based almost everything around the excellent conduit abstraction. It has the following features:
Haskell types for representing Logstash messages, along with the type-classes necessary for converting them from and to JSON
An ElasticSearch conduit, using the bulk insert API
A Redis source using the pipelining features of the hedis package, and a simple Redis sink
A Logstash listener, based on the TCP listener from network-conduit, able to accept latin1 and UTF-8 messages at the same time
A pair of “retrying” sinks, one using a Socket and the other establishing TCP connections. They are used for garanteed delivery of a whole ByteString segment, retrying to connect until it is sent (this is obviously useful for JSON messages)
A few functions for handling bulk APIs in Conduits
And finally, the coolest part, a few helper functions that will let you route between conduits !
The last part was made after a little discussion on the Haskell-Cafe mailing list. It is built with with stm-conduit, which already has a helper function for merging sources. This package introduces the other useful functionnality: the ability to “route” items coming from a source to several sinks. The main function, branchConduits, works by taking a Source, a routing function, and a Sink list. The routing function associates a (possibly empty) list of integers to every item coming from the Source. These integers directly map to the corresponding Sink, letting you define the routing policy.
The package includes a few examples of common tasks, all of them with acceptable runtime performance, such as :
Moving messages from a TCP server to Redis
Moving messages from Redis to Elasticsearch
Routing messages between conduits
So if you need more control, or much better performance, than what you would get from Logstash, and you are not afraid to write (a lot of) code, please use this package and let me know what is missing and/or buggy!
I always thought that one of the most rewarding effect of Puppet is that the whole system gets configured automatically as nodes are added.
For me, the main, and for a long time sole, manifestation of this property is in the configuration of the Nagios servers. The built-in types lend themselves pretty well to this exercice.
Now, with PuppetDB, we have a simple and powerful way to create new effects, beyond what could be achieved with just exported resources (I believe it used to be possible before PuppetDB, but required black magic in the template files). I will demonstrate a typical use case, along with a sneak peak of the testing features that will appear in the next version of language-puppet.
Let’s say we have an HTTP proxy and several groups of servers acting as backends. You wish to be able to add servers to the pool just by running the agent on them. The site.pp should look like this:
The pdbresourcequery function comes from this excellent module, and has been included natively in language-puppet for a while. Its effect here is to fill the $backends variable with an array containing all resources that are of type Haproxy::Backend on any active node.
But now comes the complicated part: how are you supposed to write, and, more importantly, to test, the config.erb template ? As far as I know you can’t pull this off with puppet-rspec (and it is way too slow anyway). With the new testing API, you can write a simple program like this:
Main.hs
12345678910111213141516171819
moduleMainwhereimportqualifiedData.MapasMapimportControl.Monad(void)importPuppet.TestingimportPuppet.Interpreter.TypesimportFactermain::IO()main=doqfunction<-testingDaemonNothing"."allFactsvoid$qfunction"back1"void$qfunction"back2"(proxycatalog,_,_)<-qfunction"proxy"caseMap.lookup("file","/etc/haproxy/haproxy.cfg")proxycatalogofNothing->error"could not find config file"Justf->caseMap.lookup"content"(rrparamsf)ofJust(ResolvedStrings)->putStrLns_->error"could not find content"
Line by line, this program does:
lines 1-10 : various headers
line 11 : the catalog computing function is initialized, using the new testing system
line 12 : the catalog for the node back1 is computed, and stored into the fake PuppetDB
line 13 : same thing for back2
line 14 : same thing for proxy, but we keep the final catalog this time
lines 15-19 : the content of the /etc/haproxy/haproxy.cfg is displayed. This part is terrible and will be replaced by some helper soon.
The template groups the resources by their “backend_type” attribute, creates a backend block for each of them, and populates the blocks with the corresponding backends.
modules/haproxy/templates/config.erb
1234567
<%-backends=scope.lookupvar('haproxy::backends').group_bydo|x|x["backend_type"]end-%><%-backends.eachdo|backendname,backends|-%>backend <%=backendname%><%-backends.eachdo|backend|-%> server <%=backend["backend_server"]%><%=backend["backend_server"]%>:<%=backend["backend_port"]%><%-end-%><%-end-%>
And the output is :
123
backend web
server back2 back2:80
server back1 back1:80
It works! With this feature, it will soon be possible to test and experiment with the most complex aspects of inter-node interactions.
Grab the alpha version of
Hspuppetmaster (compiled for x64 Linux)! It will become a full-fledged
replacement for the default puppetmaster, but is still not ready for prime-time.
In order to use it, you must:
untar everything on your puppetmaster host
run it with ./hspuppetmaster /etc/puppet +RTS -N (the first argument is the
location of your puppet repository) as the puppet user
modify your web server configuration to redirect requests for
/production/catalog to 127.0.0.1:3000. This can be done in Apache by
disabling the “High Performance mode” (wasted a few hours on this one), and
adding something like that to your vhost configuration:
There are quite a few things it will not do (such as updating the facts in
PuppetDB), but you should experience much better catalog compilation times (I
have right now catalogs that take more than the default two minutes timeout to
compile with Puppet), sometimes much clearer error messages. As it is based on
language-puppet, it is generally much more strict than Puppet. For example it
will fail on any variable that cannot be resolved.
Please have a try and let me know how it worked for you (do use --noop!).
The language-puppet library has been created when I started to learn Haskell. As
a consequence, it uses the dreaded String type to store all kind of textual
values. It also uses the System.IO module for performing I/O. I was aware of
the file descriptor leak problem that happens when you use readFile, so I
chose for the following implementation for the Puppet file function:
This should return Just the content of the first readable file in the
parameter list, or Nothing if there are none, and should not leak any file
descriptor. Now that I am finalizing the hspuppetmaster binary, I can use my
library to (try to) compute catalogs on my production systems, using the
standard puppet agent -t --noop. It turned out that the file function was
misbehaving. Testing it in GHCi illustrates the problem:
It seems to work fine, except all file contents are empty. This behavior seems
to be common knowledge among Haskellers, and is due to the fact that the file
descriptor is closed before the output is evaluated. This is pretty horrible
(and surprising), and what is even worse is my solution:
It is a bit longer because of the use of the non-deprecated version of catch,
and because it explicitly forces evaluation of the output of hGetContents.
This behavior was extremely surprising to me, and I would like to thank the
people on #haskell for their help in devising a correct version (mine was
along the lines of !y <- hGetContents, which worked for my simple examples,
but was certain to fail at some point). This is the only IRC channel I know
of where people are at the same time active, always helpful, and knowledgeable.
The main goal of this project is, for now, to assist sysadmins editing their catalogs. The best illustration is, for now, the puppetresources application. It can:
Check a file syntax, and print what it thinks it is.
Compute a whole catalog and display it in human readable format or JSON.
Display details about a specific resource in a catalog, including special support for file contents (useful for debugging templates).
And do the two previous items using facts and/or queried data from a real PuppetDB.
It is also fast enough to compute the catalogs of all your nodes in reasonable time, which opens possibilities you would not even dream of in the Ruby Puppet world. One of them is writing “integration tests” that let you check properties related to complex environmental interactions between hosts.
In order to facilitate this, I am in the process of writing a fully fledged testing API (it is still a bit lacking). It is strongly inspired by other testing APIs and should quickly evolve into something that is very easy to use. It is not the current focus (which is to replace an actual Puppet Master with my software), but I already implemented a test that is built in the puppetresources executable: it now checks that each source parameter in each file resources points to an actual file. This is a common error pattern to me (forgetting to create the file, mistyping its name, or placing it in the wrong directory) that has now disappeared.
Oh by the way, a new version is out ! Version 0.3.2 mainly changes the license, from GPL3 to BSD3. The choice was dictated by the sudden outburst of horribly uninteresting posts about licensing that has plagued Haskell-cafe during the last few hours. I hope this will end soon, or it will not be possible to differentiate this mailing list from that of Debian.
A new version is already out, this time with JSon catalogs generation. It is not
properly tested, but Puppet seems to accept them. If someone knows how to get
puppet catalog apply to download files from a Puppet server, I am interested.
I will probably write a sample application on top of WARP and modify the
configuration of my Puppetmaster to redirect catalog requests towards it. This
means that there could be an efficient replacement to the Puppetmaster soon.
This version introduces resource relationships handling. It is also full of
nasty bugs :) An improved version is already in the works, along with great
features.
First of all, you will now get notifications when a resource is missing or when
you have created cycles. There are still some bugs :
The aliases are not taken into account.
The relationship metaparameters on classes are ignored.
With the released version, nothing is actually working. Sorry … I realized
too late how broken it was. You might want to check github, or the
updated binary packages.
This is the kind of error messages you will get when cycles are found :
12345678
puppetresources: The following cycles have been found:
File[/a]
-> File[/b] ["./manifests/site.pp" (line 557, column 9)] link is Just (RRequire,UNormal,"./manifests/site.pp" (line 556, column 9))
-> File[/c] ["./manifests/site.pp" (line 558, column 9)] link is Just (RRequire,UNormal,"./manifests/site.pp" (line 557, column 9))
-> File[/d] ["./manifests/site.pp" (line 559, column 9)] link is Just (RRequire,UNormal,"./manifests/site.pp" (line 558, column 9))
-> File[/e] ["./manifests/site.pp" (line 560, column 9)] link is Just (RRequire,UNormal,"./manifests/site.pp" (line 559, column 9))
-> File[/f] ["./manifests/site.pp" (line 561, column 9)] link is Just (RRequire,UNormal,"./manifests/site.pp" (line 560, column 9))
-> File[/g] ["./manifests/site.pp" (line 562, column 9)] link is Just (RRequire,UNormal,"./manifests/site.pp" (line 561, column 9))
Please note how each resource and link position is displayed. This is a lot more
verbose than what the vanilla Puppet spouts, but I believe it is also much more
useful. Chasing after links defined as a resource chain gets old very fast.
This is the current error message when a relationship points to (or from) an
unknown resource:
1
puppetresources: Unknown relation ("file","/b") -> ("file","/a") used at "./manifests/site.pp" (line 556, column 9) debug: (False,True,False,False)
This one is terrible, and will need to be reworked. There is still quite a bit
of work, but I am fairly pleased at how everything seems to fold in place.
I started this project at the end of April, 7 months ago. The project is
incredibly useful as it is right now, and I am confident I will be able to
provide a robust and vastly more efficient puppetmaster, with a ton of helpful
tools, before the first anniversary of this project.
A new version is out. Actually, a pair of versions have been released. The most
important change is an important bug with how default values worked that has
been fixed. The other visible improvements will be felt on the performance side,
as discussed in the previous entry.
v0.2.2
New features
A few statistics are exported.
v0.2.1
Bugs fixed
The defaults system was pretty much broken, it should be better now.
New features
Basic testing framework started.
create_resources now supports the defaults system.
defined() function works for resource references.
in operator implemented for hashes.
Multithreading works.
The ruby <> daemon communication is now over ByteStrings.
The toRuby function has been optimized, doubling the overall speed for
rendering complex catalogs.