In this post Gabriel Gonzalez traces a path from free monoids to map reduce, offering up the promise of "programming paradigms waiting to be discovered". Sounds fun right? So have a read of his article first and then let's explore!2
If a monoid is commutative, is the list type still its free object? Applying category theory magic is well above my paygrade, but this article on free magmas offers an alternative method. The author employs a principle of canonical form: that every object should have a unique representation. Applying canonicity to commutation,
b<>a should be represented the same as
a<>b. Just using the free monoid aka the list type, however, results in these two objects being represented differently:
-- Non commutative free monoid representation -- >>>  ++  ==  ++  -- False
If an ordering of the a's and b's exist, the (++) operator can be modified to pre-sort. Wrapping the list in a new type, we get:
newtype FreeCMList a = FreeCMList [a] deriving (Eq, Ord, Show) (+|+) :: (Ord a) => FreeCMList a -> FreeCMList a -> FreeCMList a (+|+) (FreeCMList a) (FreeCMList b)= FreeCMList (op a b) where op  a = a op a  = a op (x:xs) (y:ys) = if x <= y then x : op xs (y:ys) else y : op (x:xs) ys
Our free commutative monoid operator now ensures the same object is represented uniquely:
-- Commutative free monoid representation -- >>> FreeCMList  +|+ FreeCMList  == FreeCMList  +|+ FreeCMList  -- True
Map.Map a Int?
Our free commutative monoid is isomorphic to a map with the keys being the a's and the value being the number of times the a's appear. Again wrapping in a new type, we can get:
newtype FreeCM a = FreeCM (Map a Int) deriving (Eq, Ord, Show) sing :: a -> FreeCM a sing a = FreeCM $ Map.singleton a 1 (+++) :: (Ord a) => FreeCM a -> FreeCM a -> FreeCM a (+++) (FreeCM a) (FreeCM b)= FreeCM (Map.unionWith (+) a b)
and this is isomorphic to our first attempt:
cm :: (Ord a) => Iso' (FreeCMList a) (FreeCM a) cm = iso toMapCM toListCM where toMapCM (FreeCMList l) = FreeCM $ Map.unionsWith (+) ((`Map.singleton` 1) <$> l) toListCM (FreeCM m) = FreeCMList $ concat $ (\(k,n) -> replicate n k) <$> Map.toList m
And our candidate for map reduce is then:
mapReduceCM :: (Monoid m, Semigroup m, Ord m) => (a -> m) -> FreeCM a -> m mapReduceCM k (FreeCM xs) = Map.foldrWithKey (\k a acc -> acc <> fold (replicate a k)) mempty (Map.mapKeys k xs)
multiplication emerges from a commutative monoidal magma (but that's another story ...)
HashMap.HashMap a Int?
That our type had an
Ord constraint was a pretty big extra requirement. If we dropped this, and instead accept a
Hashable, we get yet another canonical representation.
newtype FreeCMHash a = FreeCMHash (HashMap.HashMap a Int) deriving (Eq, Show) (+#+) :: (Eq a, Hashable a) => FreeCMHash a -> FreeCMHash a -> FreeCMHash a (+#+) (FreeCMHash a) (FreeCMHash b)= FreeCMHash (HashMap.unionWith (+) a b) mapReduceCMHash :: (Monoid m, Semigroup m) => (a -> m) -> FreeCMHash a -> m mapReduceCMHash f (FreeCMHash xs) = HashMap.foldrWithKey (\k a acc -> acc <> fold (replicate a (f k))) mempty xs
Hashable would get a free object clean of extra constraints3, but the operator would be very, very bad - having to compare the next element with the entire tree.
There's a simplification in the original "map-reduce is really foldMap" analogy: MapReduce should rightly be called MapSomethingSomethingReduce as there are fundamental steps between map and reduce. Something like4:
Map -> Combiner -> Partitioner -> Sort -> Shuffle -> Sort -> Reduce
And what are the fundamental features of these middle steps? From wiki:
Between the map and reduce stages, the data are shuffled (parallel-sorted / exchanged between nodes) ... Shuffle operation per se is not related to the essence of MapReduce; it's needed to distribute calculations over the cloud. However, if it changes the order of the elements in the list of A, then the monoid must be commutative.
Apart from side effect management, the middle bits start to look somewhat like the manipulations you would use for a commutative monoid knowing it's free object representation.
Haskell is an ideal language to discriminate between the side effects in map-reduce, and working with free objects. Lurking in the current messiness might be a much tighter development loop, one where law discovery in representational algebra and objects leads straight-forwardly to the unique free object, and thus best computational strategy to employ.
Twitter: Tony Day
build recipe for this repo
stack build --test --exec "pandoc -f markdown -i readme.md -t html -o index.html"↩
There's plenty to explore. The code here, together with some tests and laws, and a free commutative, idempotent monoid (it's a Set, I reckon) is available here. Some partially-mapped territory includes tower which is really one big list of free objects waiting to be discovered. There's also words, a haskell word counter, which is the canonical example used when explaining and exploring what's going on in a map-reduce.↩
It is difficult to imagine a commutative magma without an
Eq. There has to be some way to discriminate the