In this post we will see a that mutual information between functions (e.g. two images) can be expressed in terms of their gradient fields. First some definitions and background. Let be a differential function for some . In this post typically represents an image or a pair of images with and . Let be the Jacobian of . This is the matrix with

for and .

The *entropy * of is defined by the integral

where is the determinant and the standard volume element. (In this post I will disregard any question of well-definedness of this integral.) To motivate this definition: if and is injective then is the usual differential entropy of the push forward of the standard volume form.

If then the Jacobian equals the gradient of and so the entropy becomes

.

Let then the *mutual information* of is defined by

.

Mutual information is always non-negative. It expresses how much information is gained by knowing the joint value distribution of compared to knowing only the value distributions of the separate coordinates . In other words, mutual information is a measure of dependence between the coordinates: The higher the dependence the higher the mutual information while for independent coordinates the mutual information is (there is no information to be gained from their joint value distribution).

The nice thing about mutual information is that it is invariant under any injective coordinate-wise distortion. In imaging related terms it is for example invariant under changes of gamma, gain and offset of the image. This is hugely beneficial in practical imaging applications where lighting conditions are never the same. Different images (the coordinates) may even have been produced with completely different sensing equipment.

A key observation about mutual information is the following:

for some function with values in that depends only on the direction of the gradients but not their length. Moreover if and only if the gradients are linearly dependent and if and only if they are mutually orthogonal. Using this decomposition mutual information can be expressed as

.

This confirms that mutual information is non-negative since and therefore . I will conclude this post by looking at the specific case of a pair of 2-dimensional images so the case that . Then the function has a simple explicit form. Let be the angle between the gradients and . Then

.

There are two pleasant observations to make:

- Mutual information of a pair of images depends only on the double angle between their gradients. In particular it does not depend on the length or a sign change of either gradient.
- The expression is easy to compute as an inner product. The double angle can be accounted for by a simple rational transformation of the gradient. This will be explained in more detail in a next post.

A next post will discuss the application of mutual information to image registration. It results in a method that is very efficient (based on FFT), is robust against image distortions and can also be applied to register (locate) a partial template image of any shape within a bigger scene.