Let be joint random variables (typically not independent). This post describes which linear map
relates these variables best in the sense that the expected square error
is minimised. Related applications such as principle component analysis and auto encoding under random distortion (such as drop out) are recovered as special cases.
Vectors will be interpreted as column vectors and transposition is denoted by a superscript asterisk. It is assumed that is non-singular (and therefore positive definite). Define
. Then
is positive semi-definite on
. Let
be orthogonal eigenvectors in order of decreasing eigenvalue and let
be the orthogonal projection of
onto the span of
for
. (Note that this definition leaves some choice in case not all eigenvalues are distinct since the projections are not unique in that case.) Now the main result states:
For each
the linear map
minimises the expected square error
among all linear maps of rank
.
Let’s apply this result to two special cases. For the first case we simply assume that ,
and
. In this case
and
. So the projections
project onto the eigenspaces of the covariance matrix
and
. This result coincides with principle component analysis for the variable
.
For the second case assume that and
for some random diagonal matrix where each diagonal entry
is independent (also of
) and Bernoulli distributed with probability
. The matrix
models dropout in the coefficients of
. Let
be the covariance matrix of
,
the diagonal of
, and
. In this case
and
.
Now are projections onto eigenspaces of
,
which is the semi-positive definite matrix that appeared in the previous post about linear encoders with dropout. Finally in this case .