In this post we will see a that mutual information between functions (e.g. two images) can be expressed in terms of their gradient fields. First some definitions and background. Let $f{: [0, 1]^n} \to [0, 1]^m$ be a differential function for some $n \geq m$. In this post $f$ typically represents an image or a pair of images with $n=2$ and $m \in \{1, 2\}$. Let $D$ be the Jacobian of $f$. This is the $m \times n$ matrix with

$\displaystyle{D_{ij}=\frac{\partial f_i}{\partial x_j}}$ for $1 \leq i \leq m$ and $1 \leq j \leq n$.

The entropy $H(f)$ of $f$ is defined by the integral

$H(f) = \displaystyle{\int_{[0,1]^n} \tfrac12 \log \lvert D D^{\mathrm{t}}\rvert \, \mathrm{d}V}$

where $\lvert \cdot \rvert$ is the determinant and $\mathrm{d}V$ the standard volume element. (In this post I will disregard any question of well-definedness of this integral.) To motivate  this definition: if $n=m$ and $f$ is injective then $H(f)$ is the usual differential entropy of the push forward $f_{\ast}(\mathrm{d}V)$ of the standard volume form.

If $m=1$ then the Jacobian $D = \nabla f$ equals the gradient of $f$ and $\lvert D D^{\mathrm{t}} \rvert = \lVert \nabla f \rVert^2$ so the entropy becomes

$\displaystyle{H(f) = \int_{[0,1]^n} \log \lVert \nabla f \rVert \, \mathrm{d}V}$.

Let $f = (f_1, \ldots, f_m)$ then the mutual information $I(f)$ of $f$ is defined by

$\displaystyle{I(f) = \sum_{k=1}^m H(f_k) - H(f)}$.

Mutual information is always non-negative. It expresses how much information is gained by knowing the joint value distribution of $f$  compared to knowing only the value distributions of the separate coordinates $f_1, \ldots, f_m$. In other words, mutual information is a measure of dependence between the coordinates: The higher the dependence the higher the mutual information while for independent coordinates the mutual information is $0$ (there is no information to be gained from their joint value distribution).

The nice thing about mutual information is that it is invariant under any injective coordinate-wise distortion. In imaging related terms it is for example invariant under changes of gamma, gain and offset of the image. This is hugely beneficial in practical imaging applications where lighting conditions are never the same. Different images (the coordinates) may even have been produced with completely different sensing equipment.

A key observation about mutual information is the following:

$\displaystyle{\lvert D D^{\mathrm{t}} \rvert = v \cdot \prod_{k=1}^m \lVert \nabla f_k \rVert^2}$

for some function $v$ with values in $[0, 1]$ that depends only on the direction of the gradients but not their length. Moreover $v=0$ if and only if the gradients are linearly dependent and $v=1$ if and only if they are mutually orthogonal. Using this decomposition mutual information can be expressed as

$\displaystyle{I(f) = \int_{[0,1]^n} -\tfrac12 \log(v) \, \mathrm{d}V}$.

This confirms that mutual information is non-negative since $v\in[0,1]$ and therefore $\log(v) \leq 0$. I will conclude this post by looking at the specific case of a pair of 2-dimensional images so the case that $n=m=2$. Then the function $v$ has a simple explicit form. Let $\alpha$ be the angle between the gradients $\nabla f_1$ and $\nabla f_2$. Then

$v = \sin^2(\alpha) = \frac{1-\cos(2\alpha)}{2}$.

There are two pleasant observations to make:

1. Mutual information of a pair of images depends only on the double angle between their gradients. In particular it does not depend on the length or a sign change of either gradient.
2. The expression $\cos(2\alpha)$ is easy to compute as an inner product. The double angle can be accounted for by a simple rational transformation of the gradient. This will be explained in more detail in a next post.

A next post will discuss the application of mutual information to image registration. It results in a method that is very efficient (based on FFT), is robust against image distortions and can also be applied to register (locate) a partial template image of any shape within a bigger scene.