## Transducers: Pipes versus Machines

In hindsite1, what I was trying to say in my last post was this:

Folds in the foldl library are Moore machines and what I often need is a Mealy machine.

A user of the Fold type (scan in the pipes library is an example) has access to the computational results via done which is only hooked up to the state. A Moore machine consumes an input tape and provides a state (which Fold then provides access to). A Mealy machine, in contrast, has these components but is also an output tape producer. A Mealy version of Fold might look something like this:

or you could decide not to hook up state at all:

All of which gives me an excuse to take the machines library for a spin, make some comparisons with the pipes ecosystem, and opine on the metaphor of transducers.

## Step (Reducer)

At the heart of my particular toy problem (summarizing an input tape of Ints by adding them up and producing an output tape value whenever they exceed 10) is a step function that I found to be common across libraries and typing decisions. Let’s give it a type that fits in with the Mealy argument ordering in machines.

(Code link 2 and imports3 at the end.)

In some circles this is referred to as a reducer, though I don’t see what is being “reduced”. More natural (if possible) is to see step busted into (a -> x), a morphism to state, and (x -> x -> x), a state Endo.

I can use this to build a machines library Mealy using unfoldMealy :: (s -> a -> (b, s)) -> s -> Mealy a b

or I can roll my own (hiding state to avoid those dang skolems)

## Machines

To see Mealy in action you need an input tape and some machinery to run the tape.

I’m ignoring (loss of) final state to keep things simple.

For a first time user of machines, that was easy!

## Pipes

Pipes and machines are at least first-cousins when it comes to the problem domains they tackle. Here’s my Mealy inserted directly into a Pipe:

## Transducers

I need a collective noun to describe pipes and machines, and I’m going to opt for transducers and then use the term transduction to describe the application of a specific transducer computation.

Inserting an effect into each transduction, rather than cheating by using ghci to show a pure result gives:

The print effects are almost identical and there’s clear correspondence in constructing each transduction. I couldn’t really find a consumer in machines (just one enigmatic note in the docs that consumers are just monoids - huh?), but consumers are over-rated and it’s often easier to avoid Consumers in pipes and just trail the transduction with a forever await discard.

## Transducer transformers

Labeling them first-cousins might be conservative. Could the two libraries be isomorphic? I have no idea, but two examples to get you thinking:

You can turn a Producer into a Source (it came out as a Plan but same diff):

You can “tee” a machines Process, providing a Producer:

How’s that for the power of monad transformation! It’s a pain to work with them, but when you find the right hoist, it’s sheer magic.

## Machines versus Pipes (versus Skolems versus State versus foldl’)

I have identical transductions so it’s time for a race!

First a dump of the results including a few alternatives, then some explaining:

I use a bespoke criterion-based benchmark for most of my performance testing4. The 1.0 release saw an opening up of the library, and access to lower-level benchmarking measures, which is awesome. In particular, you can get time split into mutation (real work as the docs say) and garbage collection. Remember to use +RTS -t if you try this at home.

## machines vs pipes

So the first set of numbers is 1 sample of a 1 million values tape. Not counting GC, Pipes wins with 0.234 secs per mill (234 nanos per tick), versus a very respectable 277 nanos for machines. The second set of numbers are 100 samples of a 10000 value tape. Pipes is rock-steady - no time and space leaks here! Machines has a slight improvement in mutation, suggesting some time leak within the transduction, and a difference in GC suggesting a space leak. The slowdown by a factor of 10 for a reduction of 100 in the tape size is indicative of a quadratic blowup somewhere (probably my bad).

## tail recursion vs state

The translation of Mealy to a pipe could have been achieved by including state in the pipe, rather than using tail recursion:

And this convenience slows things down a touch (289 vs 234 nanos).

Note that I lifted pipes’ish stuff. If you lift the non-pipes elements, bad stuff happens (289 versus 6930 nanos). Don’t lift pipes if you can help it.

## foldl vs pipes

An obvious difference between pipes and machines is that machine integrates reducers (Moore and Mealy) whilst foldl is separated out from pipes.

Squeezing Mealy into a Moore’ish Foldl.Fold clocked in at a blistering 135 nanos mutation, but with a GC problem, most likely in my step function specification.

## skolems

Exposing state slowed things down a touch, 246 nanos versus 234 nanos - right on tekmo’s rule-of-thumb. But this is a simple step function, and given that’s it’s only 5% of the total cost, just maybe the cost is worthwhile to users tying themselves in knots over escaping skolems. Free skolem-kind I say!

## just a fold

Of course, if you can do without the notion of transducers, reducers, machines and tapes, then you can just fold the values at 93 nanos versus 239 nanos with transducers hanging off the computation (and I might have introduced a space problem that shows up in GC).

## just c

I wanted to include the machine-level limit. Low value runs were coming in at 9 nanos but I got a segment fault at higher numbers and reminded myself why I code in haskell.

## And the winner is …

It was an interesting tussle in the battle of the transducers. Pipes/foldl wins the speed race but machines has very respectable numbers, especially given a certain lack of attention. Machines gets points for ease of use, and for having a Mealy machine at all. Pipes gets bonuses for separation of transducer and folder. Both libraries get kudos for incredible flexibility in monad transformation.

Haskell is the winner everywhere except for lacking a stable Step definition. I now know that Folds/Reducers/Steps are also (more or less) Moore machines, but it was a difficult factoid to locate. The pipes ecosystem is missing out on the benefits of the Mealy - that you can stick an output tape into your transduction and increase the expressive power compared with existing tools. But it’s taken a lot of digging to reach that conclusion.

Moore and Mealy and the design patterns around them deserve wide exposure and their own library, called Step or Reducer or whatever.

## Afterword

A while back there was some commentary on clojure reducers (x -> a -> x) and transducers (x -> a -> x) -> (x -> b -> x), and it was pointed out that clojure reducers are Folds in the lens and foldl sense, without the finalization (and state initialization) goodness built in to enable composition.

It may well be that the clojure notion of transducers is, in part, an alternative to passing around a complete Fold with finalization and state; instead allowing reducer receivers to handle that aspect. But they also might be onto something, though haven’t gone far enough. In the world of pipes, it’s becoming idiomatic to pass around Producers with something like Producer a m () -> Producer b m () (and often easier than pull-based pipes), either directly or via lens like Lens' (Producer a m ()) (Producer b m ()).

Regardless, pipes and machines take signals and process them, conforming to the classical metaphor of a transducer in electrical circuits, and transducer is the right name for what they are.

1. What I missed was this post from kmett. I did read it at the time, but failed to heed the scary part warning. Then, when I read that Moore machines were cofree comonads, my brain exploded.

2. full code here

3. import list

4. criterion

January 24, 2015