The language-puppet website.

Work with your manifests!

A Faster RWS Variant

I recently saw comments stating that the Writer (and thus RWS) monad is slow and leaky. I have several programs and libraries built around the RWS monad, where I use the writer part only for logging (that is, snocing a single element at the end of the log), and already noticed in some cases that the profiler reported up to 80% of the time usage spent doing mappends.

Here is a POC, with a pair of monads I called RSS (for Reader - State - State). They offer an API similar to that of RWS, the only difference being that the writer part is internally riding with the state.

Tests with a (top secret) real program gave a two times speed increase. For language-puppet, it is a 10% speed increase for a single run, but most of the time is dominated by the initial parsing, so I expect the gain to be more substantial for other type of applications (complex tests come in mind). I wrote a benchmark, that compares the lazy and strict versions of the RSS and RWS monads, using several distinct monoids as the writer type. Here are the results on my machine :

RSS.Lazy RSS.Strict RWS.Lazy RWS.Strict
Seq 193 µs 194 µs 4.95 ms 457 µs
[] 4.46 ms 4.47 ms 4.89 ms 538 µs
Vector.Prim 764 µs 784 µs 47.72 ms 47.46 ms
RevList 173 µs 175 µs 10.57 ms 5.60 ms
IntSet 187 µs 184 µs 4.98 ms 472 µs
DList 177 µs 176 µs 4.56 ms 302 µs
  • RevList is the best data structure for my use case (it is just a newtype around a regular list, with reverse applied when the list is extracted), which should not come as a surprise. What was surprising (to me) is how terrible it turns out to me under RWS …
  • The performance of Vector.Prim with the RWS monad is terrible. I suppose this is due to some optimization rule that could not be deduced in that case. I was however surprised at the good performance compared to lists in the RSS case.
  • The RSS version is much faster than the RWS one … except for lists !
  • The strict version of the RSS monad doesn’t have a leaky writer (test here).

I also added a tellElement helper function that would snoc an element to the writer (with suitable monoids) :

tell tellElement
Seq 193 µs 231 µs
List 4.46 ms 4.45 ms
RevList 173 µs 200 µs

It seems that, to my surprise, there is a small price to pay for calling snoc instead of mappend.

I will probably release this as a package so that I can build language-puppet against it. I will copy’n’paste all the haddocks from the mtl and transformers packages, and move the modules so as to mimick their hierarchy. If you have suggestions, especially concerning the package or module names, strictness properties of the new monads (a subject where I am at loss), please let me know.

EDIT : Added benchmark results for DList