## Mutual information and fast image registration

In the previous post the mutual information for a pair of images

$\displaystyle{f = (f_1, f_2) {:[0,1]^2} \to [0,1]^2}$

was expressed in terms of the angle $\alpha$ between the gradients of $f_1$ and $f_2$:

$\displaystyle{I(f) = \int_{[0,1]^2} -\tfrac12 \log \left(\sin^2(\alpha) \right) \mathrm{d}V = \int_{[0,1]^2} -\tfrac12 \log\left(\frac{1-\cos(2\alpha)}2\right) \mathrm{d}V}$.

Note that if both images are equal then $\alpha=0$ everywhere and mutual information is not well defined in the form above. This can be fixed  by adding a Gaussian error term to the Jacobian of $f$ as we will explore at the end of this post. Here we take a different approach that circumvents this problem altogether.

The goal of image registration is to position the reference image $f_1$ by shifting it relative to $f_2$ such that mutual information is maximized. At that  position the reference image $f_1$ provides the most information about $f_2$. This makes perfect sense for image registration. Note however that the formula above is rather expensive to use directly for this purpose since it has to be computed for every possible offset of the reference image $f_1$. One way around this is to apply some algorithm to reduce the number of positions to inspect such as steepest ascend. This is not what we will do here. Instead we use an approximation for mutual information that is efficient to compute for all positions.

Observing that the integrand $p(\alpha)$ in the mutual information expression is a function of $\cos(2\alpha)$ we will approximate it by trigonometric polynomials of the form

$\displaystyle{p_d(\alpha) = \log(2) + \sum_{k=1}^dc_k \cos(2k \alpha)}$

for some finite degree $d \geq 1$. The constant term is chosen such that

$\displaystyle{\int_{-\pi/2}^{\pi/2}p(\alpha) \mathrm{d}\alpha = \int_{-\pi/2}^{\pi/2} p_d(\alpha) \mathrm{d}\alpha = \pi \log(2)}$

for each trigonometric polynomial, so the area below all these functions is the same on the interval $[-\pi/2, \pi/2]$. The chosen approximation method considers these functions as probability densities. In particular each approximation $p_d$ is chosen such that

1. $p_d(\alpha) \geq 0$ for all $\alpha$
2. $p_d(-\pi/2) = p_d(\pi/2) = 0$ and
3. $p_d$ minimizes the Kullback-Leibler divergence $\mathrm{KL}(p, p_d)$ among all such trigonometric polynomials of degree $d$.

The first three approximations are:

1. $p_1(\alpha) \approx 0.693 + 0.693\cos(2\alpha)$
2. $p_2(\alpha) \approx 0.693+0.912 \cos(2 \alpha)+0.219 \cos(4 \alpha)$
3. $p_3(\alpha) \approx 0.693+0.969 \cos(2 \alpha)+0.425 \cos(4 \alpha)+0.149 \cos(6 \alpha)$

Their graphs are shown below. Note that the peak around $\alpha=0$ gets more pronounced with an increasing degree. Also $p_2$ turns out to be very close (up to a scalar multiple) to $\cos^4(\alpha)$ which is easier to remember and could be used as an alternative approximation.

Here is a plot of $p_3$ together with the actual distribution $p$.

Instead of the actual distribution $p$ for mutual information we can use any of the approximations $p_d$ to compute the approximate mutual information

$\displaystyle{I_d(f) = \int_{[0,1]^2} p_d(\alpha) \, \mathrm{d}V}$.

Moreover this value can be computed efficiently for every image offset of the reference image $f_1$ as a cross-correlation of  vector fields as we will see now. Identify $\mathbb{R}^2$ with the complex plane $\mathbb{C}$. The inner product on $\mathbb{R}^2$ translates to complex numbers as

$\displaystyle{\langle z, w \rangle = \mathrm{Re}(z \overline{w})}$.

Write the gradients of $f_1$ and $f_2$ normalized to length one as complex fields $z_1$ and $z_2$ (so $\lvert z_1 \rvert = \lvert z_2 \rvert = 1$). Then for the angle $\alpha$ between these gradients and any integer $k$ we have

$\displaystyle{\cos(k\alpha) = \langle z_1^k, z_2^k \rangle}$.

This shows that every term in $I_d(f)$ is a cross-correlation between two fields ($z_1^k$ and $z_2^k$) and hence can be computed efficiently using FFT. Since for the reference image everything can be pre-computed the approximate mutual information $I_d(f)$ can be computed with $2d+1$ real FFT computations: two per term $\cos(2k \, \alpha)$ for $k=1 \ldots d$ to compute its spectrum and one reverse transform to compute the cross-correlation from a linear combination of all these spectra.

As a bonus this method can be used without any modification to locate only a part of the reference image $f_1$ in $f_2$: Simply mask out the part of $f_1$ to be ignored by setting its normalized gradient field $z_1$ to zero there. This works because the fields are normalized and are not affected by local variations in gain and offset when the masked reference image is matched with $f_2$. (In contrast, such variations make this simple masking scheme useless for an ordinary cross-correlatin approach.)

A final remark about how, as mentioned in the beginning of this post, error terms can be introduced in mutual information to get rid of the $\log$ singularity at $\alpha = 0$. If the Jacobian appearing in the definition of mutual information is distorted by adding a Gaussian error term then mutual information is changed to

$\displaystyle{I(t; f) = \int_{[0,1]^2} -\tfrac12 \log(\sin^2(\alpha) + t^2 \cos^2(\alpha)) \, \mathrm{d}V}$

where $t \in [0, 1]$ is an error ratio. Note that for $t=0$ we regain the original mutual information while for the other extreme $t=1$ we have $I(1; f) = 0$. Note that for $t > 0$ the singularity at $\alpha=0$ is resolved. For those who want to explore further in this direction I leave you with the remark that

$\displaystyle{\int_{-\pi/2}^{\pi/2} -\tfrac12 \log(\sin^2(\alpha) + t^2 \cos^2(\alpha)) \, \mathrm{d}\alpha = -\pi \log \left( \frac{t+1}2 \right)}$

which helps in treating the integrand as a probability density in the angle $\alpha$.