Game trees under a partial order part 3: Alpha Beta pruning

In this last post of this series of three I will discuss alpha beta pruning. The context will be evaluation functions on game trees that take values in a distributive lattice. Of course a prime example would be the linear lattice \mathscr{L}(X, \leq) over the partially ordered set (X, \leq) but this post does not depend on specific features of this linear lattice. The result is a description of alpha beta pruning for distributive lattices that is a slightly improved version of that given by Ginsberg et al in their paper “Alpha Beta pruning under partial orders” from 2001.

Alpha Beta pruning is a way to compute the value of an evaluation function on a game tree for the root node by visiting as little child nodes as necessary. See the previous post for a more extensive introduction to evaluation functions. Here I will only briefly recall the recursive definition of an evaluation function f on a game tree that takes values in a lattice L. I will assume that L has a minimal and maximal element denoted by \mathbf{0} and \mathbf{1}. The normal forms \textrm{CNF}(X, \leq) and \textrm{DNF}(X, \leq) of the linear lattice are examples of this.

  1. The value of f for a leaf node is given by some heuristic game evaluation function (e.g. chess piece values).
  2. At an inner min node (where the minimizing player is to make a move) the value of f is defined recursively as the greatest lower bound in L of all values of f on direct child nodes.
  3. At an inner max node the value of f is defined as the least upper bound of all values of f on direct child nodes.

This definition leads to the following algorithm for the computation of f. This algorithm is also the basis of further considerations on ways to prune parts of the game tree that are irrelevant for the computation of f on the root node.

function f(node)
  local value
  if node is a leaf node then
    value = heuristic position value of node
  elseif node is a min node then
    value = 1
    for child in all children of node do
      value = value ∧ f(child)
    end
  else
    value = 0
    for child in all children of node do
      value = value ∨ f(child)
    end
  end
  return value
end

The structure of this naive algorithm is simple. To compute the value at a min node start with the maximal value \mathbf{1} then iterate over all child nodes and adjust the node’s value by taking the greatest lower bound with the value of a child node. At a max node the procedure is similar but starts off with the minimal value \mathbf{0} and repeatedly takes the least upper bound. The computation recurses depth first left to right over all child nodes, where “left to right” is according to the enumeration order of child nodes. Move ordering does not play a role in this naive algorithm but it can play a big role in alpha beta pruning in terms of the number of pruned nodes. I will not address move ordering in this post. During execution of the algorithm the state of the computation is schematically depicted below. As in the example in the previous post I will assume that the top node is a min node.

Intermediate result in the recursive computation of an evaluation function

Intermediate values in the depth first left to right computation of an evaluation function

Here x_1, x_2, \dotsc and y_1, y_2, \dotsc are the intermediate values that are computed for the inner min and max nodes respectively. The \star symbol indicates all subtrees located to the left of the current path of computation in the tree. These are the subtrees for which a value is already (recursively) computed. When the computation progresses each intermediate value x_j descends from \mathbf{1} and each y_j ascends from \mathbf{0}. When we would cut the computation short at the point depicted above then the algorithm would track back up and yield the final outcome

x_1 \wedge (y_1 \vee (x_2 \wedge \dotsc \vee (x_n \wedge y_n)))

at the root node. The essential point of alpha beta pruning is that it identifies when this intermediate value remains constant even when y_n gets larger in the remainder of the computation. Because in that case any further children of the node marked y_n in the diagram will not contribute to the final result and can be pruned (skipped). In the picture above the search path ends with a max node but that is irrelevant for alpha beta pruning as we will see.

A lattice L has a canonical partial order where x \leq y is equivalent to x \wedge y = x (or alternatively x \vee y = y). The lattice operators \wedge and \vee indeed result in the greatest lower bound and least upper bound with respect to this ordering. The following lattice property is the key point in alpha beta pruning.

Lemma: When x, y, z \in L and y \leq z then x \wedge y \leq x \wedge z and x \vee y \leq x \vee z.

Since y \leq z the greatest lower bound x \wedge y is also a lower bound for x and z and is therefore less than or equal to the greatest lower bound x \wedge z. The other inequality is derived similarly. For sequences x_1, x_2, \dotsc and y_1, y_2, \dotsc in L define sequences \alpha and \beta recursively by

  1. \alpha_1 = x_1 \wedge y_1 and \alpha_{j+1} is obtained from \alpha_j by the substitution y_j \leftarrow y_j \vee (x_{j+1} \wedge y_{j+1}).
  2. \beta_1 = x_1 and \beta_{j+1} is obtained from \beta_j by the substitution x_j \leftarrow x_j \wedge (y_j \vee x_{j+1}).

So the sequence \alpha starts with

\alpha_1 = x_1 \wedge y_1
\alpha_2 = x_1 \wedge (y_1 \vee (x_2 \wedge y_2))
\alpha_3 = x_1 \wedge (y_1 \vee (x_2 \wedge (y_2 \vee (x_3 \wedge y_3))))
\vdots

and the sequence \beta with

\beta_1 = x_1
\beta_2 = x_1 \wedge (y_1 \vee x_2)
\beta_3 = x_1 \wedge (y_1 \vee (x_2 \wedge (y_2 \vee x_3)))
\vdots

The following inequalities are a direct consequence of our lemma above. The sequences \alpha and \beta satisfy

\alpha_j \leq \alpha_{j+1} \leq \beta_{j+1} \leq \beta_j

for all indices j. In particular \alpha is an ascending sequence, \beta is a descending sequence, each \alpha_j is a lower bound for \beta and each \beta_j is an upper bound for \alpha. If \alpha_j = \beta_j or \alpha_j = \beta_{j+1} for some index j then both sequences are constant from that point onwards. The values in the sequences \alpha and \beta are precisely those obtained from aborting the computation of the evaluation function for the root node at min node j (sequence \alpha) or at max node j (sequence \beta). Hence further nodes can be pruned from the computation as soon as both sequences yield equal values.

Alpha Beta pruning (strong version): At the first index j where either \alpha_j = \beta_j or \alpha_j = \beta_{j+1} further subnodes of max node j or min node j+1 can be pruned.

Note that this formulation of alpha beta pruning resembles “classical” alpha beta pruning for totally ordered sets in that we have an ascending sequences of high values and a descending sequence of low values and nodes can be pruned as soon as these values coincide. There is also an unfortunate difference between the two. In classical alpha beta pruning the high and low values are easy to compute progressively as a simple maximum or minimum value. In contrast high and low values in the pruning algorithm above are not so simple to compute. Since they are computed deepest node first it requires constant back tracking. Moreover, computation of \wedge and \vee can be much more expensive than taking simply the maximum or minimum of two numbers.

Also note that the distributive property of the lattice L is not used at all in this version of alpha beta pruning! This does not contradict the results of Ginsberg et al since they formulate alpha beta pruning (in particular “deep pruning”) differently. I will briefly come back to this point at the end of this post.

There is a weaker version of alpha beta pruning based on alternative sequences \alpha' and \beta' that are easier to compute. The consequence is however that it may take longer before the ascending high values and descending low values allow cut off, resulting in less aggressive pruning. The weaker estimates for high and low values follow from the following identities (where the distributive property of L is now essential). Let y_0 = \mathbf{0} and define two new sequences \alpha' and \beta' by

\alpha'_j = y_0 \vee \dotsc \vee y_{j-1} and
\beta'_j = x_1 \wedge \dotsc \wedge x_j.

So \alpha'_j is the least upper bound of the values of all max nodes above min node j and \beta'_j is the greatest lower bound of the values of all min nodes above max node j. If we take \alpha_0 = \mathbf{0} and \beta_0 = \mathbf{1} then for all indices j \geq 1 the following equalities hold:

\alpha_j = \alpha_{j-1} \vee (\beta'_j \wedge y_j) and
\beta_j = \beta_{j-1} \wedge (\alpha'_j \vee x_j).

These can be proved by induction and the distributive property of the lattice L. Taking y_j = x_j in the first equation and x_j = y_{j-1} in the second results in

\beta_j = \alpha_{j-1} \vee \beta'_j and
\alpha_{j-1} = \beta_{j-1} \wedge \alpha'_j.

Comparing these equalities it follows that x_j \leq \alpha'_j implies \beta_j = \alpha_{j-1} and \beta'_j \leq y_j implies \alpha_j = \beta_j. This leads to the following version of alpha beta pruning.

Alpha Beta pruning: If the value x_j of min node j gets less than or equal to \alpha'_j or the value y_j of max node j gets greater than or equal to \beta'_j then further child nodes can be pruned.

This is a slightly different formulation of alpha beta pruning than Ginsberg et al use in their paper. What they call (shallow or deep) pruning comes down to either x_j \leq y_k for some k < j at min node j or x_k \leq y_j for some k \leq j at max node j. These are stronger conditions than those we found above and will therefore prune fewer nodes in general.

Here is an algorithmic formulation of alpha beta pruning. The value of the root node is computed by f(root, 0, 1).

function f(node, α, β)
  local value
  if node is a leaf node then
    value = heuristic position value of node
  elseif node is a min node then
    value = 1
    for child in all children of node do
      value = value ∧ f(child, α, β ∧ value)
      if value ≤ α then break end
    end
  else
    value = 0
    for child in all children of node do
      value = value ∨ f(child, α ∨ value, β)
      if value ≥ β then break end
    end
  end
  return value
end

To sum up the results of this post: I described two versions of alpha beta pruning that allow to prune nodes from the computation of an evaluation function with values in a distributive lattice at the root node. The first, strong, version has the disadvantage that it is computationally more difficult than the second. Both versions are stronger (potentially prune more nodes) than the method described in the paper by Ginsberg et al.

Posted in Uncategorized | Leave a comment

Game trees under a partial order part 2: Evaluation functions

The first part of this series introduced the maximizing and minimizing orders on finite antichains in any partially ordered set (X, \leq). In this second part I will show how these orders and their properties are used to define evaluation functions on game trees. The previous post was mainly based on the paper “Searching game trees under a partial order” by P. Dasgupta et al. In this second post I will show how it is related to another paper “Alpha-Beta pruning under partial orders” by M. Ginsberg et al from 2001. This second paper has an interesting different approach. Instead of starting with a partial order from the beginning it investigates what structure an evaluation function on a game tree must have to satisfy some reasonable constraints. The answer they find is that such an evaluation function must be defined on a lattice, that is, a set with operators “greatest lower bound” \wedge and “least upper bound” \vee. An evaluation function is then defined recursively, starting at the leaves. The minimizing player combines the values of subnodes with \wedge while the maximizing player uses \vee. Moreover, Ginsberg et al show that such an evaluation function allows alpha beta pruning (which I will define in detail in the next post) exactly if the lattice is distributive. While the paper by Ginsberg references that of Dasgupta the relation between the two is not made very explicit. This post will give a very precise and (I think) satisfying relation between the two papers ultimately improving the results of both: The work of Dasgupta is formalized in terms of lattice theory and can now be applied to any partially ordered set, not just \mathbb{R}^n and the result of Ginsberg on alpha beta pruning is sharpened in the next post, potentially leading to more aggressive pruning in game tree searches.

Consider a game tree in which a min and max player make moves in turn. Suppose that leaf nodes in this tree get a value in the partially ordered space X assigned to them. Is there a way, given the values at the leaves, to define an evaluation function on the entire tree? Let’s first assume that both players in fact use an (unspecified) linear extension \prec of the partial order \leq to select their optimal move. (Dasgupta et al call such a linear extension a strategy. See the intermezzo below for why it is fair to assume that both players use the same strategy.) Here is an example how this linear extension can be used to define an evaluation function on a tree of positions. The symbols “\bullet” at the leaf nodes indicate some (not necessarily equal) values in X. The greatest lower bound and least upper bound \wedge and \vee are taken with respect to the linear extension \prec.

Defining an evaluation function with linear extensions

An evaluation function based on a linear extension (or strategy)

In this example the minimizing player is to move and the tree represents possible move developments that are considered. The value at inner nodes is defined recursively based on the linear extension: The minimizing player takes the minimum (the greatest lower bound) of the subnodes while the maximizing player takes the maximum (the least upper bound). This leads to a value for the top node in X and the minimizing player will select a move that leads to this minimum.

Intermezzo. Assuming that the min player in this example uses an unspecified linear extension (or “strategy” as Dasgupta calls them) to select moves is central in our considerations. But why is it fair to assume that the max player uses the same strategy? The reason is simple. Suppose the max player uses a different strategy or in fact any system at all (even random decisions) to select moves. In that case the altered evaluation function would assign to each node a value that is less than or equal to the value obtained from equal strategies (when compared with the min player’s linear extension). This follows from the recursive definition of the evaluation function. Since the min player’s goal is to reach a minimal value, the worst case scenario is to assume that the opponent uses the same strategy. End of intermezzo.

Any evaluation function defined as above still depends on the choice of linear extension, so different linear extensions give different evaluation functions. What we are looking for instead is a way to define a single evaluation function based only on the given partial order on X. There is a nice trick to turn the example above into exactly such a function, but it requires to alter the target space. To define the new target space start with \mathscr{F}(X, \leq), the space of all functions that map linear extensions of \leq to X. The evaluation defined above at any node can be regarded as an example of such a function by treating the linear extension \prec in its definition not as fixed but as a variable. Let me make this more precise based on the following observations:

  1. There is a canonical embedding \iota: X \hookrightarrow \mathscr{F}(X, \leq) as constant functions: \iota_x(\prec) = x for all linear extensions \prec.
  2. If f, g \in \mathscr{F}(X, \leq) then define an element f \wedge g \in \mathscr{F}(X, \leq) by (f \wedge g)(\prec) = f(\prec) \wedge g(\prec) where \wedge in the right hand side refers to the minimum with respect to \prec.
  3. If f, g \in \mathscr{F}(X, \leq) then define an element f \vee g \in \mathscr{F}(X, \leq) by (f \vee g)(\prec) = f(\prec) \vee g(\prec) where \vee in the right hand side refers to the maximum with respect to \prec.
  4. Since any totally ordered set is a distributive lattice with respect to its minimum and maximum operators \wedge and \vee it follows that the definitions above turn \mathscr{F}(X, \leq) into a distributive lattice and therefore also into a partially ordered set.

The entire space \mathscr{F}(X, \leq) is too big for our purpose, therefore I define \mathscr{L}(X, \leq), the linear lattice over (X, \leq) as the smallest sublattice of \mathscr{F}(X, \leq) that contains X. Then \mathscr{L}(X, \leq) is still a distributive lattice. The linear lattice \mathscr{L}(X, \leq) will serve as the target space for evaluation functions on game trees under the partial order (X, \leq).

Remark. The set X embeds into \mathscr{L}(X, \leq) as a partially ordered set but not always as a lattice. For example if x, y \in X have a greatest lower bound x \wedge y \in X then \iota_{x \wedge y} is a lower bound for \iota_x and \iota_y but not necessarily the greatest lower bound. End of remark.

Example. If (X, \leq) is a totally ordered space then \mathscr{L}(X, \leq) = (X, \leq). End of example.

An evaluation function on a game tree is now defined recursively, as in the picture above, as follows.

  1. Assign values in X \subseteq \mathscr{L}(X, \leq) to the leaf nodes based on some heuristics.
  2. Define the value of an inner min node as the greatest lower bound in \mathscr{L}(X, \leq) of the values of all its direct successors.
  3. Define the value of an inner max node as the least upper bound in \mathscr{L}(X, \leq) of the values of all its direct successors.

Note that this construction is completely in line with the results of Ginsberg et al. The fact that \mathscr{L}(X, \leq) is a distributive lattice even implies that alpha beta pruning should be applicable in the recursive computation of the evaluation function. I will address that part in the next post.

The linear lattice may seem rather abstract but the fact that it is distributive and generated by X means that each element can be put in both a disjunctive and conjunctive normal form built from elements of X only. These normal forms can be described directly in terms of minimizing and maximizing orders as introduced in the first post. This will identify the linear lattice with what Dasgupta et al call “dominance algebra”. Define two partially ordered spaces, suggestively called \textrm{CNF}(X, \leq) and \textrm{DNF}(X, \leq), by applying the star order extension twice as in the picture below:

MINF and MAXF partial orders

Two step construction of DNF (top) and CNF (bottom)

So \textrm{CNF}(X, \leq) is the set of finite antichains in (X^{\ast}, \leqslant) ordered by their maximizing order and \textrm{DNF}(X, \leq) the set of finite antichains in (X^{\ast}, \eqslantless) ordered by their minimizing order. If \prec is a linear extension of \leq I denote it’s minimum and maximum operators by \curlywedge and \curlyvee respectively to distinguish them from the lattice operators in CNF and DNF. The following statements identify the normal forms of \mathscr{L}(X, \leq).

CNF: For any partially ordered space (X, \leq) the space \textrm{CNF}(X, \leq) is a lattice with minimal and maximal elements \mathbf{0} = \{\varnothing\} and \mathbf{1} = \varnothing respectively. The sublattice \textrm{CNF}(X, \leq) \setminus \{\mathbf{0}, \mathbf{1}\} is isomorphic to \mathscr{L}(X, \leq) via the rule x(\prec) = \curlywedge \{ \curlyvee s \mid s \in x\}.

DNF: For any partially ordered space (X, \leq) the space \textrm{DNF}(X, \leq) is a lattice with minimal and maximal elements \mathbf{0} = \varnothing and \mathbf{1} = \{\varnothing\} respectively. The sublattice \textrm{DNF}(X, \leq) \setminus \{\mathbf{0}, \mathbf{1}\} is isomorphic to \mathscr{L}(X, \leq) via the rule x(\prec) = \curlyvee \{ \curlywedge s \mid s \in x\}.

I will only show the CNF case. In the previous post we saw that any maximizing order always has a greatest lower bound operator \wedge and it also has a least upper bound operator \vee if its base space has one. Since in particular (X^{\ast}, \leqslant) always has a least upper bound operator, this shows that (X^{\ast \ast}, \leqslant) is a lattice. The empty set (which is a finite antichain) is maximal for any maximizing order and minimal for any minimizing order. Therefore the elements \mathbf{0} = \{\varnothing\} and \mathbf{1} = \varnothing are minimal and maximal elements for this lattice respectively. A simple verification shows that the given evaluation rule at least defines a surjective homomorphism onto \mathscr{L}(X, \leq) so it remains to prove that this homomorphism is also injective (meaning that the conjunctive normal form is unique). Suppose A, B \in \textrm{CNF}(X, \leq) are such that A(\prec) = B(\prec) for all linear extensions \prec. I will show that this implies A \subseteq B (which suffices since by symmetry then also B \subseteq A and A = B).

Let a \in A. For any x \in a there exists a linear extension \prec such that

  1. x is maximal in the lower set {\downarrow} a and
  2. if y \in X and y \prec x then y \in {\downarrow} a.

This implies that A(\prec) = x since, being an antichain, no element of A other than a itself can be entirely contained in {\downarrow} a and therefore must have a maximum greater than x. Then by assumption also B(\prec) = x. In particular all b \in B with b \leqslant a must contain x and at least one such element must exist in B. Since this holds for any x \in a it follows that a \subseteq b for all b \in B with b \leqslant a. This implies that a = b and hence a \in B.

Summing up the results of this post: The linear lattice \mathscr{L}(X, \leq) is the natural target space for evaluation functions on game trees based on heuristic evaluation of positions with values in the partially ordered set (X, \leq). Both lattices \textrm{CNF}(X, \leq) and \textrm{DNF}(X, \leq) (minus their maximal and minimal elements) are isomorphic to the linear lattice \mathscr{L}(X, \leq) and explicitly represent elements in their unique conjunctive and disjunctive normal form respectively. In particular the CNF and DNF lattices are themselves distributive and isomorphic. The isomorphisms both ways correspond to what in the paper by Dasgupta et al is called “MAX to MIN” and “MIN to MAX”.

In the next and final post in this series I will discuss alpha beta pruning for game trees under a partial order.

Posted in Uncategorized | Leave a comment

Game trees under a partial order part 1: Partial orders

I’ve been interested in the subject of alpha-beta pruning of game trees for quite some time now. It has always struck me as unsatisfactory however that it relies on a total order of game positions (in the form of an evaluation function). It would be much more natural to evaluate game positions based on several distinct features and select best moves on this richer set of information. This means essentially that the total order of game positions is replaced by a partial order. There are a number of articles that explore this idea. The theory is, somewhat surprisingly maybe, quite a bit more involved than it is for total orders. The problem is that with a partial order there is now often not a single best move to choose from and the bookkeeping to handle all these choices is rather intricate. This first post in a series of three is based on a foundational paper “Searching game trees under a partial order” by P. Dasgupta et al, published in 1996. The motivation for these posts was firstly because it is a fascinating subject but also because I found the paper rather difficult to read and understand.  I think that a more mathematical style would help the overall presentation of the subject and  that is what I will try to accomplish in this series of blogs. This first post concentrates on the concept of partial orders underpinning the concept of game tree evaluations later on. That said, this post is also interesting in itself as a result in the theory of partial orders, even without the context of its application in game trees.

Let (X, \leq) be a partially ordered set. It suffices to consider only finite sets X for our application but the presented theory applies to any set and we will not assume finiteness. I will write a < b as shorthand for a \leq b and a \neq b. Our goal in this post is to introduce a sensible way to compare finite (possibly empty) antichains in X. An antichain is a subset of X in which any two distinct elements are unordered. Recall that a linear extension \prec of \leq is a total order on X that is consistent with \leq in the sense that  x < y implies x \prec y. Throughout this series of posts the symbol \prec always indicates a linear extension and \curlywedge and \curlyvee denote the minimum (greatest lower bound) and maximum (least upper bound) for this linear extension respectively.

A subset S \subseteq X is called a lower set if for all x \in S and y \in X the inequality y \leq x implies that y \in S. Any subset S \subseteq X can be extended to a smallest lower set containing S by the definition

{\downarrow}S = \{ y \in X \mid y \leq x for some x \in S\}.

Note that if S, T \subseteq X are antichains such that {\downarrow} S = {\downarrow} T then S = T. Similarly one can define upper sets and the smallest upper set {\uparrow} S containing S.

Lemma: Let S_1 \subseteq \dotsc \subseteq S_n = X be nested lower sets. Then there is a linear extension \prec of \leq such that for all indices j < k and all elements x \in S_j and y \in S_k \setminus S_j the strict inequality x \prec y holds.

If we take n = 1 then this lemma simply states that there exists a linear extension. A more important consequence is the following.

Corollary: If S \subseteq X is an antichain then there exists a linear extension \prec such that for any x \in S and y \in X the inequality y \prec x implies that y \in {\downarrow} S. Moreover, for any x \in S this extension can be chosen such that x is maximal in {\downarrow} S.

The first part follows by taking n=2 and the lower set S_1 = {\downarrow} S in the lemma. For the second part take n = 3 and the two lower sets S_1 = {\downarrow}(S \setminus \{x\}) and S_2 = {\downarrow} S. Since x \in S_2 \setminus S_1 it is greater than all other elements in S for the resulting linear extension.

Linear extensions are in a sense the “hidden variables” in the theory of game trees under a partial order. If game positions are only partially ordered then searching for moves will most likely not result in a single best move but in a finite set of best moves. Nevertheless a player must still choose a single move from such a set. I will make the fundamental assumption that both players use an otherwise unspecified linear extension \prec to make their choice. (Dasgupta et al call the linear extension they use a “strategy”. Moreover they consider situations where both players use a different strategy. This turns out to be unnecessary as we will see in the next post.) Then the “minimizing player” MIN always selects a minimum move according to \prec while the “maximizing player” MAX always selects a maximum move. Clearly different linear extensions lead to different gameplay for both players but nevertheless this idea will lead to a definition of gameplay that only depends on the underlying partial order and not on a particular choice of strategy. In what follows the central idea is therefore to single out properties that hold for all possible linear extensions and are “universal” in that sense.

Based on this idea we will construct minimizing and maximizing orders on finite antichains of X. Suppose that player MIN is in a situation where only two moves are possible. The first move will lead to a finite set A \subseteq X of best moves for player MAX to choose from while the second move leads to a finite set B. Then player MAX will select the maximum \curlyvee A when MIN plays the first move and \curlyvee B when MIN plays the second. For player MIN this means that the first move is preferred over the second precisely when \curlyvee A \preceq \curlyvee B. If this relation holds for all possible linear extensions then the first move for MIN is universally better than the second. We can take this as the definition for a minimizing order \leqslant on finite antichains in X and express it as A \leqslant B. This definition is potentially rather awkward, quantifying over all linear extensions, but it can be drastically simplified as I will now show.

If A \subseteq {\downarrow} B then for each x \in A there is a y \in B such that x \leq y. In particular, for any linear extension the inequality x \preceq y also holds and therefore \curlyvee A \preceq \curlyvee B. So A \leqslant B holds in this case. Now assume instead that there is an x \in A such that x \not \in {\downarrow} B. Then according to our lemma there is a linear extension such that y \prec x for all y \in B and so \curlyvee B \prec \curlyvee A for this particular linear extension. In this case the relation A \leqslant B does not hold. This justifies the definition below. An alternative formulation that works for all finite subsets is included as a small bonus. See the intermezzo for some more background.

Minimizing order: For finite antichains A, B \subseteq X define the relation A \leqslant B as A \subseteq {\downarrow} B or alternatively for all finite subsets as (A \setminus B) \subseteq {\downarrow}(B \setminus A).

Similar considerations (with the roles of the MIN and MAX player reversed) lead to the following definition of a maximizing order, where {\uparrow} A now indicates the smallest upper set containing A:

Maximizing order: For finite antichains A, B \subseteq X define the relation A \eqslantless B as B \subseteq {\uparrow} A or alternatively for all finite subsets as (B \setminus A) \subseteq {\uparrow}(A \setminus B).

The minimizing order expresses that the minimizing player MIN prefers the set of game positions A over B while the maximizing order expresses that player MAX prefers B over A. It is not difficult to check that the maximizing and minimizing orders are in fact partial orders on finite antichains in X. Note that when A, B \subseteq X are finite antichains and A \subseteq B then B \eqslantless A and A \leqslant B. In particular for any finite antichain A \subseteq X the inequalities \varnothing \leqslant A and A \eqslantless \varnothing hold. This also shows that these are different partial orders in general.

Intermezzo. The alternative formulation of the maximizing and minimizing orders results in partial orders on all finite subsets. These partial orders are then almost but not quite identical to what Dasgupta et al call “procedure compare.” Where Dasgupta et al prefer a non-empty set over an empty one in that procedure, the maximizing and minimizing orders always prefer the empty set over a non-empty one as already stated before. The transitivity property requires some attention and can be proved by the nice picture below, where it is assumed that A \eqslantless B and B \eqslantless C.

Transitivity diagram

Descending chains in the transitivity proof

The nodes in the diagram correspond to subsets of A \cup B \cup C where the letters in a node select all elements that are exactly in these subsets and in no others. So “A” means A \setminus (B \cup C) and “A, B” means (A \cap B) \setminus C and so on. The top two nodes together form the set C \setminus A and the bottom two nodes A \setminus C. Directed edges leaving a node indicate that for each a in the source set there exists b in one of the target sets such that b < a. The edges together encode the information obtained from the relations A \eqslantless B and B \eqslantless C. Since descending chains must be finite each chain starting in C \setminus A (top row) must eventually reach A \setminus C (bottom row) and transitivity follows. End of intermezzo.

Until now I only considered the conditions under which a single antichain A is preferred over an antichain B for player MIN, which is expressed by A \leqslant B. However, player MIN will often have to choose from more than two possible moves so we also have to consider conditions under which a set \{A_1, \dotsc, A_n\} of more than one antichain is preferred over B. That would mean that for any linear extension \prec there is some index j such that \curlyvee A_j \preceq \curlyvee B where the index might depend on the choice of linear extension. However, it turns out that in this case there must be an index j such that already A_j \leqslant B. In other words, player MIN prefers a set of antichains over B if and only if some member of that set is already preferred over B. This again follows from the lemma. Let \prec be a linear extension such that x \prec y for all x \in B and y \not \in {\downarrow} B. Suppose that A_j \not \leqslant B for all indices j. Then for each index j there is an x_j \in A_j such that x_j \not \in {\downarrow} B. Then \curlyvee B \prec x_j \preceq \curlyvee A_j for all indices j and therefore the set \{A_1, \dotsc, A_n\} is not preferred over B.

Let X^{\ast} denote the set of finite (possibly empty) antichains in the partially ordered set X.  Then both (X^{\ast}, \eqslantless) and (X^{\ast}, \leqslant) are partially ordered sets with the maximizing and minimizing order respectively. Later on the lattice structure of such partial orders will play an important role. A partially ordered set (X, \leq) is a lattice when all non-empty finite subsets S \subseteq X have a greatest lower bound \wedge S and a least upper bound \vee S. In that case it is easy to see that \wedge S and \vee S are uniquely defined. The following theorems give sufficient conditions for (X^{\ast}, \eqslantless) and (X^{\ast}, \leqslant) to be lattices.

Maximizing lattice: In the partially ordered set (X^{\ast}, \eqslantless) every non-empty finite subset S \subseteq X^{\ast} has a greatest lower bound \wedge S \in X^{\ast}. If every non-empty finite subset T \subseteq X has a least upper bound \vee T \in X then the same holds for X^{\ast} and so (X^{\ast}, \eqslantless) is a lattice.

Minimizing lattice: In the partially ordered set (X^{\ast}, \leqslant) every non-empty finite subset S \subseteq X^{\ast} has a least upper bound \vee S \in X^{\ast}. If every non-empty finite subset T \subseteq X has a greatest lower bound \wedge T \in X then the same holds for X^{\ast} and so (X^{\ast}, \leqslant) is a lattice.

I will prove only the maximizing lattice property, so the partial order on X^{\ast} is the maximizing order in what follows. Let S_1, \dotsc, S_n \in X^{\ast} be finite antichains. The union S = S_1 \cup \dotsc \cup S_n is a finite subset of X but not necessarily an antichain.  Let \min(S) \subseteq S be the set of minimal elements in S, that is, those elements which have no smaller elements in S. Then \min(S) is an antichain and because S is finite \min(S) \eqslantless S_j for all indices j. So \min(S) is a lower bound for \{S_1, \dotsc, S_n\} \subseteq X^{\ast}. If S' \in X^{\ast} is any lower bound then for each x \in S there is a y \in S' such that y \leq x. This means that S' \eqslantless \min(S) since \min(S) \subseteq S and so \min(S) is the greatest lower bound.

Now suppose every non-empty finite subset T \subseteq X has a least upper bound \vee T \in X. If U' \in X^{\ast} is an upper bound for \{S_1, \dotsc, S_n\} \subseteq X^{\ast} then for all y \in U' there exist x_1 \in S_1, \dotsc, x_n \in S_n such that x_j \leq y for all indices j and so \vee\{x_1, \dotsc, x_n\} \leq y. Let U \subseteq X be the set of minimal elements

U = \min \left\{ \vee \{x_1, \dotsc, x_n\} \mid (x_1, \dotsc, x_n) \in S_1 \times \dotsc \times S_n \right\}.

Then U is an antichain and U \eqslantless U' by the above. Now U is itself also an upper bound for \{S_1, \dotsc, S_n\} and so it is the least upper bound. This shows that under the given condition (X^{\ast}, \eqslantless) is a lattice.

To sum up the results of this post: For any partially ordered set (X, \leq) we constructed a minimizing and a maximizing partial order on the set X^{\ast} of finite antichains in X. These indicate when a set of outcomes is universally preferred (under all linear extensions) over another set by a minimizing and maximizing player respectively. For these “star extended” partially ordered sets we defined the lattice operations of taking the greatest lower bound and least upper bound (when these exist). In the next post in this series we will start to explore actual game trees under a partial order, starting with the definition of evaluation functions.

Posted in Uncategorized | 1 Comment

Monoids and data structures

Thinking about ways to formally specify behavior of software it occurred to me that there is a very natural way to express data structures and their operations in the language of monoids without referring to an actual implementation of that data structure. A monoid is a set with an associative binary operator (written as multiplication) and an identity element. Monoids in this post will be given by a generating set and a number of relations between the generators. Multiplication corresponds to adding an element to a data structure or merging two data structures. Functions on data structures are defined recursively using the fact that each member can be expressed as some product of generators.

As a first example I will define a stack datatype (last in first out) with the operation “push”, “pop” and “top”. A stack of type T is simply the free monoid generated by all elements of T without any additional relations:

\text{stack}\langle T \rangle = \langle T \rangle

The empty stack is represented by the unit 1 \in \text{stack} \langle T \rangle. Pushing an element x \in T onto a stack s \in \text{stack} \langle T \rangle is defined by

\text{push}{:{}} \text{stack} \langle T \rangle \times T \rightarrow \text{stack} \langle T \rangle\\ \text{push}(s, x) = s \cdot x

Popping the top element from a non-empty stack is defined recursively as

\text{pop}{:{}} \text{stack} \langle T \rangle \rightarrow \text{stack} \langle T \rangle\\ \text{pop}(s) = \begin{cases} 1 & \text{if } s \in T \\ u \cdot \text{pop}(v) & \text{if } s = u \cdot v \text{ for some }u, v \neq 1\end{cases}.

The top element on a non-empty stack is given by the following recursive definition

\text{top}{:{}} \text{stack} \langle T \rangle \rightarrow T\\ \text{top}(s) = \begin{cases} s & \text{if } s \in T \\ \text{top}(v) & \text{if }s = u \cdot v \text{ for some } u, v \neq 1\end{cases}

A queue (first in first out) has a very similar definition. These first two examples are based on free monoids which makes it somewhat easier to define their structure and operations. As a slightly more interesting example I will now define a set (as an unordered sequence in which each element appears at most once). The monoid for a set of elements in T is generated by T and some additional relations:

\text{set} \langle T \rangle = \langle T \mid x \cdot x = x \text{ for all }x \text{ and } x \cdot y = y \cdot x \text { for all } x \neq y \rangle

These additional relations allow a “compact” representation of sets in the monoid (for example x \cdot x \cdot x = x) but more importantly they capture the requirements of a set. The relation x \cdot x = x expresses that all elements in a set have multiplicity one (I will come back to this in a moment) and x \cdot y = y \cdot x that a set is unordered. Here are the definitions of some sensible set operations:

Insert an element:

\text{insert}{:{}} \text{set} \langle T \rangle \times T \rightarrow \text{set} \langle T \rangle\\ \text{insert}(s, x) = s \cdot x

Set union:

\text{union}{:{}} \text{set} \langle T \rangle \times \text{set} \langle T \rangle \rightarrow \text{set} \langle T \rangle\\ \text{union}(s, t) = s \cdot t

Test if an element is present:

\text{contains}{:{}} \text{set} \langle T \rangle \times T \rightarrow \{ \textbf{true}, \textbf{false} \}\\ \text{contains}(s, x) = \begin{cases} \textbf{false} & \text{if } s = 1\\ \textbf{true} & \text{if }s = x\\ \text{contains}(u, x) \vee \text{contains}(v, x) & \text{if } s = u \cdot v \text{ for some }u, v \neq 1\end{cases}

Remove an element:

\text{remove}{:{}} \text{set} \langle T \rangle \times T \rightarrow \text{set} \langle T \rangle\\ \text{remove}(s, x) = \begin{cases} 1 & \text{if } s = x\\ s & \text{if }s = 1 \text{ or }s \in T \text{ and } s \neq x\\ \text{remove}(u, x) \cdot \text{remove}(v, x) & \text{if } s = u \cdot v \text{ for some }u, v \neq 1\end{cases}

All the operations are well defined given the relations on the monoid \text{set} \langle T \rangle. On the other hand, the following multiplicity function is not well defined even though it might look acceptable at first glance:

\text{mult}{:{}} \text{set} \langle T \rangle \times T \rightarrow \mathbb{N}\\ \text{mult}(s, x) = \begin{cases} 1 & \text{if }s = x\\ 0 & \text{if }s = 1 \text{ or } s \in T \text{ and } s \neq x\\ \text{mult}(u) + \text{mult}(v) & \text{if }s = u \cdot v \text{ for some }u, v \neq 1\end{cases}

since it would result in 1 = \text{mult}(x, x) = \text{mult}(x \cdot x, x) = 1 + 1. The problem here is that + is symmetric but not idempotent and does not respect the relations x \cdot x = x that hold in \text{set} \langle T \rangle. This underlines that the relations encode the requirements for a data structure by restricting the possible operations that can be defined on them. Note that dropping the relation x \cdot x = x would result in a multi set on which the operation \text{mult} is well defined.

As a final example I will define an associative array, without spending too many words. Here are the definitions, where \pi{:{}} S \times T \rightarrow S is projection on the first coordinate:

\text{map}\langle S, T \rangle = \langle S \times T \mid x \cdot y = y \text { if } \pi(x) = \pi(y) \text{ and } x \cdot y = y \cdot x \text{ if } \pi(x) \neq \pi(y) \rangle

Insert (or update) a pair x \in S \times T:

\text{insert}{:{}} \text{map} \langle S, T \rangle \times (S \times T) \rightarrow \text{map} \langle S, T \rangle\\ \text{insert}(m, x) = m \cdot x

Test if a key x \in S is present:

\text{has}{:{}} \text{map} \langle S, T \rangle \times S \rightarrow \{ \textbf{true}, \textbf{false} \}\\ \text{has}(m, x) = \begin{cases}\textbf{true} & \text {if }m \in S \times T \text{ and } \pi(m) = x\\ \textbf{false} & \text{if } m=1 \text{ or } m \in S \times T \text{ and } \pi(m) \neq x\\ \text{has}(u, x) \vee \text{has}(v, x) & \text{if }m = u \cdot v \text{ for some }u, v \neq 1\end{cases}

Lookup the value that is associated with key x \in S if it is present:

\text{index}{:{}} \text{map} \langle S, T \rangle \times S \rightarrow T\\ \text{index}(m, x) = \begin{cases}y & \text{if }m = (x, y) \in S \times T\\ \text{index}(v, x) & \text{if }m = u \cdot v \text{ for some }u, v \neq 1 \text{ and } \text{has}(v, x) = \textbf{true}\\ \text{index}(u, x) & \text{if }m = u \cdot v \text{ for some }u, v \neq 1 \text{ and } \text{has}(v, x) = \textbf{false}\end{cases}

Remove the key x \in S:

\text{del}{:{}} \text{map} \langle S, T \rangle \times S \rightarrow \text{map} \langle S, T \rangle\\ \text{del}(m, x) = \begin{cases}1 & \text{if } m \in S \times T \text{ and } \pi(m) = x\\ m & \text{if }m = 1 \text{ or }m \in S \times T \text{ and } \pi(m) \neq x\\ \text{del}(u, x) \cdot \text{del}(v, x) & \text{if }m = u \cdot v \text{ for some }u, v \neq 1\end{cases}

Posted in Uncategorized | Leave a comment

Functions with positive real part

Let f be a holomorphic function on the unit disc \Delta such that f(0)=1 and \Re f(z) > 0 for all z \in \Delta. If the Taylor expansion of f is

f(z) = 1 + 2a_1z+2a_2z^2+2a_3z^3+\dotsc

then a classical results states that |a_n| \leq 1 for all integer n \geq 1. Moreover, such functions are in one-to-one correspondence with probability measures on the unit circle \Gamma as follows. Let \mu be a probability measure on \Gamma and let \hat{\mu}_k \in \mathbb{C} for k \in \mathbb{Z} be its Fourier coefficients

\hat{\mu}_k = \displaystyle \int_{\Gamma} z^{-k} d\mu(z).

Then \hat{\mu}_0 = 1, |\hat{\mu}_k| \leq 1 and \hat{\mu}_{-k} is the complex conjugate of \hat{\mu}_k. If \mu has a density function \delta with respect to the Lebesgue probability measure that is holomorphic on some neighborhood of \Gamma then \delta is real-valued on \Gamma and has the Laurent expansion

\delta(z) = \displaystyle \sum_{k \in \mathbb{Z}} \hat{\mu}_k z^k.

Now define a function f_{\mu} on a neighborhood of the closed disc \overline{\Delta} as follows:

f_{\mu}(z) = 1 + 2\displaystyle \sum_{k = 1}^{\infty} \hat{\mu}_k z^k.

Then, using the fact that \overline{z} = z^{-1} for all z \in \Gamma, for all z \in \Gamma we have \Re f(z) = \delta(z) \geq 0 and therefore \Re f(z) > 0 for all z \in \Delta. It turns out that the correspondence \mu \leftrightarrow f_{\mu} between probability measures and functions with positive real part also holds for \mu that not have such a nice density function. (In general the density is a hyper function on \Gamma related to f_{\mu}). The correspondence that I sketched here is usually presented as a convexity result in functional analysis, but I think that the direct link to density functions is quite instructive.

There is however also a different way to parameterize functions with a positive real part but it is much less known (I’d be interested in any reference in fact). Let g be any holomorphic function on a neighborhood of the closed disc \overline{\Delta} with Taylor expansion

g(z) = \displaystyle \sum_{k=0}^{\infty} a_kz^k.

Let \overline{g} be the holomorphic function such that \overline{g}( \overline{z} ) = \overline{g(z)}, so the coefficients of \overline{g} are the complex conjugates of those of g. Then for all z \in \Gamma

g(z) \overline{g}(z^{-1}) = g(z) \overline{g(z)} = |g(z)|^2 \geq 0.

Now the function g(z) \overline{g}(z^{-1}) has the Laurent expansion

(\displaystyle \sum_{k=0}^{\infty} a_kz^k)(\sum_{k=0}^{\infty} \overline{a}_kz^{-k}) = \sum_{k \in \mathbb{Z}} \sum_{m = 0}^{\infty} a_{m+k} \overline{a}_m z^k

where I used the convention that a_k=0 for all k < 0. Now the sequence a is square summable and so it is an element of the Hilbert space \ell^2. Let T: \ell^2 \rightarrow \ell^2 be the left shift operator, so (T^ka)_m = a_{m+k} for all k \geq 0 and indices m. Then the Laurent series above can be rewritten as

\displaystyle (\sum_{k=1}^{\infty} \langle a, T^ka \rangle z^{-k}) + ||a||^2 + (\sum_{k=1}^{\infty} \langle T^ka, a \rangle z^k)

where \langle \cdot, \cdot \rangle denotes the sesquilinear form on \ell^2. Note that by the Cauchy-Schwarz inequality |\langle T^ka, a \rangle| \leq ||T^ka|| \cdot ||a|| \leq ||a||^2. Now define a holomorphic function f_g on a neighborhood of the closed disc \overline{\Delta} by

f_g(z) = ||a||^2 + 2 \displaystyle \sum_{k=1}^{\infty} \langle T^ka, a \rangle z^k.

Then for all z \in \Gamma we have \Re f_g(z) = |g(z)|^2 \geq 0 and therefore \Re f_g(z) > 0 on the disc \Delta. The association g \mapsto f_g is not injective however, since g_1 and g_2 map to the same function if |g_1(z)|^2 = |g_2(z)|^2 on the unit circle \Gamma. In this case, both g_1 and g_2 are equal to some function g without zeroes on the disc \Delta multiplied by appropriate finite Blaschke products. This function g is then unique up to multiplication with a constant on the unit circle.

Posted in Uncategorized | Leave a comment

Polynomial residues

If p is a prime then m^p \equiv m (mod p) for all integers m. So the polynomial f = X^p-X \in \mathbb{Z}[X] has the property that f(m) \equiv 0 (mod p) for all integers m. Since \mathbb{Z}_p is an integral domain, any polynomial in \mathbb{Z}[X] of degree d that is not divisible by p has at most d roots in \mathbb{Z}_p. The degree of a polynomial that vanishes identically on \mathbb{Z}_p must therefore be at least p, so X^p-X is also minimal in that sense. What can we say if we replace the prime p by just any positive integer n? In other words, what can we say about f \in \mathbb{Z}[X] if f(m) \equiv 0 (mod n) for all integers m? This question was studied by Aubrey Kempner in the 1920′s. Here are some of his results:

“Suppose f \in \mathbb{Z}[X] has degree d and vanishes identically on \mathbb{Z}_n. Let \dfrac{n}{d!} = \dfrac{m}{q} for coprime integers m and q. Then f is divisible by m.”

In particular if the coefficients of f have no common factor then n must divide d!, the factorial of its degree. For prime n this implies that f is divisible by n unless d is at least n and this is in accordance with the first paragraph above. To see that this bound is sharp for any n define the binomial polynomials \binom{x}{d} by

\displaystyle \binom{x}{d} = \dfrac{x (x-1) \dotsc (x-d+1)}{d!}.

The binomial polynomials take integer values at integer arguments (the number of ways to choose d elements from a set of x). Therefore, if n divides d! then d! \binom{x}{d} is a monic polynomial of degree d in \mathbb{Z}[X] that vanishes identically on \mathbb{Z}_n. The other way around, suppose f \in \mathbb{Z}[X] has degree d and f(m) \equiv 0 (mod n) for all integers m. Let \dfrac{m}{q} = \dfrac{n}{d!} as before. Then there exist coefficients c_0, \dotsc, c_d \in \mathbb{Z} such that

f(x) = n \displaystyle \sum_{k=0}^d c_k \binom{x}{k} = \frac{m}{q} \sum_{k=0}^d c_k d! \binom{x}{k}.

For each k the polynomial c_k d! \binom{x}{k} has integer coefficients and since m and q are coprime the equation above implies that m divides f.

Posted in Uncategorized | Leave a comment

A short alternative to Gordan

The nicely named “theorem of the alternative” of Paul Gordan states the following

If V \subset \mathbb{R}^n is a linear subspace then either V contains a vector with positive coordinates or the orthogonal complement V^{\perp} contains a non-zero vector with non-negative coordinates.”

These options are mutually exclusive, hence the name of the theorem. Gordan gave a proof of this in 1873 which involved a very clever inductive argument. I just thought of a self contained very short proof of this theorem. In fact I will prove an equivalent statement

If v_1, \dotsc, v_n \in \mathbb{R}^k then either there is a vector v \in \mathbb{R}^k such that the inner product \langle v, v_j \rangle is positive for all j \in \{1, \dotsc, n\} or the convex hull C of v_1, \dotsc, v_n contains the origin.”

The equivalence can be seen as follows. Let M be the n \times k matrix with v_j as its rows. Then the columns of M span a subspace V \subset \mathbb{R}^n and this transforms the second statement into the first and vice versa.

Now let v \in C have minimal norm. If |v| = 0 then C contains the origin. If |v| > 0 and w is any point in the convex hull then the line segment from v to w lies in C and contains no point with smaller norm than v. So for all t \in (0,1] we have

\dfrac{|(1-t) v + t w|^2 - |v|^2}{2t} = \langle v, w \rangle - |v|^2+ \dfrac{t}{2}|w-v|^2 \geq 0.

This implies that \langle v, w \rangle \geq |v|^2 > 0 and in particular this holds for all w \in \{v_1, \dotsc, v_n \}. Done!


Posted in Uncategorized | Leave a comment