## Monoids and data structures

Thinking about ways to formally specify behavior of software it occurred to me that there is a very natural way to express data structures and their operations in the language of monoids without referring to an actual implementation of that data structure. A monoid is a set with an associative binary operator (written as multiplication) and an identity element. Monoids in this post will be given by a generating set and a number of relations between the generators. Multiplication corresponds to adding an element to a data structure or merging two data structures. Functions on data structures are defined recursively using the fact that each member can be expressed as some product of generators.

As a first example I will define a stack datatype (last in first out) with the operation “push”, “pop” and “top”. A stack of type T is simply the free monoid generated by all elements of T without any additional relations:

$\text{stack}\langle T \rangle = \langle T \rangle$

The empty stack is represented by the unit $1 \in \text{stack} \langle T \rangle$. Pushing an element $x \in T$ onto a stack $s \in \text{stack} \langle T \rangle$ is defined by

$\text{push}{:{}} \text{stack} \langle T \rangle \times T \rightarrow \text{stack} \langle T \rangle\\ \text{push}(s, x) = s \cdot x$

Popping the top element from a non-empty stack is defined recursively as

$\text{pop}{:{}} \text{stack} \langle T \rangle \rightarrow \text{stack} \langle T \rangle\\ \text{pop}(s) = \begin{cases} 1 & \text{if } s \in T \\ u \cdot \text{pop}(v) & \text{if } s = u \cdot v \text{ for some }u, v \neq 1\end{cases}$.

The top element on a non-empty stack is given by the following recursive definition

$\text{top}{:{}} \text{stack} \langle T \rangle \rightarrow T\\ \text{top}(s) = \begin{cases} s & \text{if } s \in T \\ \text{top}(v) & \text{if }s = u \cdot v \text{ for some } u, v \neq 1\end{cases}$

A queue (first in first out) has a very similar definition. These first two examples are based on free monoids which makes it somewhat easier to define their structure and operations. As a slightly more interesting example I will now define a set (as an unordered sequence in which each element appears at most once). The monoid for a set of elements in $T$ is generated by $T$ and some additional relations:

$\text{set} \langle T \rangle = \langle T \mid x \cdot x = x \text{ for all }x \text{ and } x \cdot y = y \cdot x \text { for all } x \neq y \rangle$

These additional relations allow a “compact” representation of sets in the monoid (for example $x \cdot x \cdot x = x$) but more importantly they capture the requirements of a set. The relation $x \cdot x = x$ expresses that all elements in a set have multiplicity one (I will come back to this in a moment) and $x \cdot y = y \cdot x$ that a set is unordered. Here are the definitions of some sensible set operations:

Insert an element:

$\text{insert}{:{}} \text{set} \langle T \rangle \times T \rightarrow \text{set} \langle T \rangle\\ \text{insert}(s, x) = s \cdot x$

Set union:

$\text{union}{:{}} \text{set} \langle T \rangle \times \text{set} \langle T \rangle \rightarrow \text{set} \langle T \rangle\\ \text{union}(s, t) = s \cdot t$

Test if an element is present:

$\text{contains}{:{}} \text{set} \langle T \rangle \times T \rightarrow \{ \textbf{true}, \textbf{false} \}\\ \text{contains}(s, x) = \begin{cases} \textbf{false} & \text{if } s = 1\\ \textbf{true} & \text{if }s = x\\ \text{contains}(u, x) \vee \text{contains}(v, x) & \text{if } s = u \cdot v \text{ for some }u, v \neq 1\end{cases}$

Remove an element:

$\text{remove}{:{}} \text{set} \langle T \rangle \times T \rightarrow \text{set} \langle T \rangle\\ \text{remove}(s, x) = \begin{cases} 1 & \text{if } s = x\\ s & \text{if }s = 1 \text{ or }s \in T \text{ and } s \neq x\\ \text{remove}(u, x) \cdot \text{remove}(v, x) & \text{if } s = u \cdot v \text{ for some }u, v \neq 1\end{cases}$

All the operations are well defined given the relations on the monoid $\text{set} \langle T \rangle$. On the other hand, the following multiplicity function is not well defined even though it might look acceptable at first glance:

$\text{mult}{:{}} \text{set} \langle T \rangle \times T \rightarrow \mathbb{N}\\ \text{mult}(s, x) = \begin{cases} 1 & \text{if }s = x\\ 0 & \text{if }s = 1 \text{ or } s \in T \text{ and } s \neq x\\ \text{mult}(u) + \text{mult}(v) & \text{if }s = u \cdot v \text{ for some }u, v \neq 1\end{cases}$

since it would result in $1 = \text{mult}(x, x) = \text{mult}(x \cdot x, x) = 1 + 1$. The problem here is that $+$ is symmetric but not idempotent and does not respect the relations $x \cdot x = x$ that hold in $\text{set} \langle T \rangle$. This underlines that the relations encode the requirements for a data structure by restricting the possible operations that can be defined on them. Note that dropping the relation $x \cdot x = x$ would result in a multi set on which the operation $\text{mult}$ is well defined.

As a final example I will define an associative array, without spending too many words. Here are the definitions, where $\pi{:{}} S \times T \rightarrow S$ is projection on the first coordinate:

$\text{map}\langle S, T \rangle = \langle S \times T \mid x \cdot y = y \text { if } \pi(x) = \pi(y) \text{ and } x \cdot y = y \cdot x \text{ if } \pi(x) \neq \pi(y) \rangle$

Insert (or update) a pair $x \in S \times T$:

$\text{insert}{:{}} \text{map} \langle S, T \rangle \times (S \times T) \rightarrow \text{map} \langle S, T \rangle\\ \text{insert}(m, x) = m \cdot x$

Test if a key $x \in S$ is present:

$\text{has}{:{}} \text{map} \langle S, T \rangle \times S \rightarrow \{ \textbf{true}, \textbf{false} \}\\ \text{has}(m, x) = \begin{cases}\textbf{true} & \text {if }m \in S \times T \text{ and } \pi(m) = x\\ \textbf{false} & \text{if } m=1 \text{ or } m \in S \times T \text{ and } \pi(m) \neq x\\ \text{has}(u, x) \vee \text{has}(v, x) & \text{if }m = u \cdot v \text{ for some }u, v \neq 1\end{cases}$

Lookup the value that is associated with key $x \in S$ if it is present:

$\text{index}{:{}} \text{map} \langle S, T \rangle \times S \rightarrow T\\ \text{index}(m, x) = \begin{cases}y & \text{if }m = (x, y) \in S \times T\\ \text{index}(v, x) & \text{if }m = u \cdot v \text{ for some }u, v \neq 1 \text{ and } \text{has}(v, x) = \textbf{true}\\ \text{index}(u, x) & \text{if }m = u \cdot v \text{ for some }u, v \neq 1 \text{ and } \text{has}(v, x) = \textbf{false}\end{cases}$

Remove the key $x \in S$:

$\text{del}{:{}} \text{map} \langle S, T \rangle \times S \rightarrow \text{map} \langle S, T \rangle\\ \text{del}(m, x) = \begin{cases}1 & \text{if } m \in S \times T \text{ and } \pi(m) = x\\ m & \text{if }m = 1 \text{ or }m \in S \times T \text{ and } \pi(m) \neq x\\ \text{del}(u, x) \cdot \text{del}(v, x) & \text{if }m = u \cdot v \text{ for some }u, v \neq 1\end{cases}$