Monoids and data structures

Thinking about ways to formally specify behavior of software it occurred to me that there is a very natural way to express data structures and their operations in the language of monoids without referring to an actual implementation of that data structure. A monoid is a set with an associative binary operator (written as multiplication) and an identity element. Monoids in this post will be given by a generating set and a number of relations between the generators. Multiplication corresponds to adding an element to a data structure or merging two data structures. Functions on data structures are defined recursively using the fact that each member can be expressed as some product of generators.

As a first example I will define a stack datatype (last in first out) with the operation “push”, “pop” and “top”. A stack of type T is simply the free monoid generated by all elements of T without any additional relations:

\text{stack}\langle T \rangle = \langle T \rangle

The empty stack is represented by the unit 1 \in \text{stack} \langle T \rangle. Pushing an element x \in T onto a stack s \in \text{stack} \langle T \rangle is defined by

\text{push}{:{}} \text{stack} \langle T \rangle \times T \rightarrow \text{stack} \langle T \rangle\\ \text{push}(s, x) = s \cdot x

Popping the top element from a non-empty stack is defined recursively as

\text{pop}{:{}} \text{stack} \langle T \rangle \rightarrow \text{stack} \langle T \rangle\\ \text{pop}(s) = \begin{cases} 1 & \text{if } s \in T \\ u \cdot \text{pop}(v) & \text{if } s = u \cdot v \text{ for some }u, v \neq 1\end{cases}.

The top element on a non-empty stack is given by the following recursive definition

\text{top}{:{}} \text{stack} \langle T \rangle \rightarrow T\\ \text{top}(s) = \begin{cases} s & \text{if } s \in T \\ \text{top}(v) & \text{if }s = u \cdot v \text{ for some } u, v \neq 1\end{cases}

A queue (first in first out) has a very similar definition. These first two examples are based on free monoids which makes it somewhat easier to define their structure and operations. As a slightly more interesting example I will now define a set (as an unordered sequence in which each element appears at most once). The monoid for a set of elements in T is generated by T and some additional relations:

\text{set} \langle T \rangle = \langle T \mid x \cdot x = x \text{ for all }x \text{ and } x \cdot y = y \cdot x \text { for all } x \neq y \rangle

These additional relations allow a “compact” representation of sets in the monoid (for example x \cdot x \cdot x = x) but more importantly they capture the requirements of a set. The relation x \cdot x = x expresses that all elements in a set have multiplicity one (I will come back to this in a moment) and x \cdot y = y \cdot x that a set is unordered. Here are the definitions of some sensible set operations:

Insert an element:

\text{insert}{:{}} \text{set} \langle T \rangle \times T \rightarrow \text{set} \langle T \rangle\\ \text{insert}(s, x) = s \cdot x

Set union:

\text{union}{:{}} \text{set} \langle T \rangle \times \text{set} \langle T \rangle \rightarrow \text{set} \langle T \rangle\\ \text{union}(s, t) = s \cdot t

Test if an element is present:

\text{contains}{:{}} \text{set} \langle T \rangle \times T \rightarrow \{ \textbf{true}, \textbf{false} \}\\ \text{contains}(s, x) = \begin{cases} \textbf{false} & \text{if } s = 1\\ \textbf{true} & \text{if }s = x\\ \text{contains}(u, x) \vee \text{contains}(v, x) & \text{if } s = u \cdot v \text{ for some }u, v \neq 1\end{cases}

Remove an element:

\text{remove}{:{}} \text{set} \langle T \rangle \times T \rightarrow \text{set} \langle T \rangle\\ \text{remove}(s, x) = \begin{cases} 1 & \text{if } s = x\\ s & \text{if }s = 1 \text{ or }s \in T \text{ and } s \neq x\\ \text{remove}(u, x) \cdot \text{remove}(v, x) & \text{if } s = u \cdot v \text{ for some }u, v \neq 1\end{cases}

All the operations are well defined given the relations on the monoid \text{set} \langle T \rangle. On the other hand, the following multiplicity function is not well defined even though it might look acceptable at first glance:

\text{mult}{:{}} \text{set} \langle T \rangle \times T \rightarrow \mathbb{N}\\ \text{mult}(s, x) = \begin{cases} 1 & \text{if }s = x\\ 0 & \text{if }s = 1 \text{ or } s \in T \text{ and } s \neq x\\ \text{mult}(u) + \text{mult}(v) & \text{if }s = u \cdot v \text{ for some }u, v \neq 1\end{cases}

since it would result in 1 = \text{mult}(x, x) = \text{mult}(x \cdot x, x) = 1 + 1. The problem here is that + is symmetric but not idempotent and does not respect the relations x \cdot x = x that hold in \text{set} \langle T \rangle. This underlines that the relations encode the requirements for a data structure by restricting the possible operations that can be defined on them. Note that dropping the relation x \cdot x = x would result in a multi set on which the operation \text{mult} is well defined.

As a final example I will define an associative array, without spending too many words. Here are the definitions, where \pi{:{}} S \times T \rightarrow S is projection on the first coordinate:

\text{map}\langle S, T \rangle = \langle S \times T \mid x \cdot y = y \text { if } \pi(x) = \pi(y) \text{ and } x \cdot y = y \cdot x \text{ if } \pi(x) \neq \pi(y) \rangle

Insert (or update) a pair x \in S \times T:

\text{insert}{:{}} \text{map} \langle S, T \rangle \times (S \times T) \rightarrow \text{map} \langle S, T \rangle\\ \text{insert}(m, x) = m \cdot x

Test if a key x \in S is present:

\text{has}{:{}} \text{map} \langle S, T \rangle \times S \rightarrow \{ \textbf{true}, \textbf{false} \}\\ \text{has}(m, x) = \begin{cases}\textbf{true} & \text {if }m \in S \times T \text{ and } \pi(m) = x\\ \textbf{false} & \text{if } m=1 \text{ or } m \in S \times T \text{ and } \pi(m) \neq x\\ \text{has}(u, x) \vee \text{has}(v, x) & \text{if }m = u \cdot v \text{ for some }u, v \neq 1\end{cases}

Lookup the value that is associated with key x \in S if it is present:

\text{index}{:{}} \text{map} \langle S, T \rangle \times S \rightarrow T\\ \text{index}(m, x) = \begin{cases}y & \text{if }m = (x, y) \in S \times T\\ \text{index}(v, x) & \text{if }m = u \cdot v \text{ for some }u, v \neq 1 \text{ and } \text{has}(v, x) = \textbf{true}\\ \text{index}(u, x) & \text{if }m = u \cdot v \text{ for some }u, v \neq 1 \text{ and } \text{has}(v, x) = \textbf{false}\end{cases}

Remove the key x \in S:

\text{del}{:{}} \text{map} \langle S, T \rangle \times S \rightarrow \text{map} \langle S, T \rangle\\ \text{del}(m, x) = \begin{cases}1 & \text{if } m \in S \times T \text{ and } \pi(m) = x\\ m & \text{if }m = 1 \text{ or }m \in S \times T \text{ and } \pi(m) \neq x\\ \text{del}(u, x) \cdot \text{del}(v, x) & \text{if }m = u \cdot v \text{ for some }u, v \neq 1\end{cases}

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s