Thinking about ways to formally specify behavior of software it occurred to me that there is a very natural way to express data structures and their operations in the language of monoids without referring to an actual implementation of that data structure. A monoid is a set with an associative binary operator (written as multiplication) and an identity element. Monoids in this post will be given by a generating set and a number of relations between the generators. Multiplication corresponds to adding an element to a data structure or merging two data structures. Functions on data structures are defined recursively using the fact that each member can be expressed as some product of generators.

As a first example I will define a stack datatype (last in first out) with the operation “push”, “pop” and “top”. A stack of type T is simply the free monoid generated by all elements of T without any additional relations:

The empty stack is represented by the unit . Pushing an element onto a stack is defined by

Popping the top element from a non-empty stack is defined recursively as

.

The top element on a non-empty stack is given by the following recursive definition

A queue (first in first out) has a very similar definition. These first two examples are based on free monoids which makes it somewhat easier to define their structure and operations. As a slightly more interesting example I will now define a set (as an unordered sequence in which each element appears at most once). The monoid for a set of elements in is generated by and some additional relations:

These additional relations allow a “compact” representation of sets in the monoid (for example ) but more importantly they capture the requirements of a set. The relation expresses that all elements in a set have multiplicity one (I will come back to this in a moment) and that a set is unordered. Here are the definitions of some sensible set operations:

Insert an element:

Set union:

Test if an element is present:

Remove an element:

All the operations are well defined given the relations on the monoid . On the other hand, the following multiplicity function is *not* well defined even though it might look acceptable at first glance:

since it would result in . The problem here is that is symmetric but not idempotent and does not respect the relations that hold in . This underlines that the relations encode the requirements for a data structure by restricting the possible operations that can be defined on them. Note that dropping the relation would result in a multi set on which the operation is well defined.

As a final example I will define an associative array, without spending too many words. Here are the definitions, where is projection on the first coordinate:

Insert (or update) a pair :

Test if a key is present:

Lookup the value that is associated with key if it is present:

Remove the key :