Let be joint random variables (typically not independent). This post describes which linear map relates these variables best in the sense that the expected square error is minimised. Related applications such as principle component analysis and auto encoding under random distortion (such as drop out) are recovered as special cases.
Vectors will be interpreted as column vectors and transposition is denoted by a superscript asterisk. It is assumed that is non-singular (and therefore positive definite). Define . Then is positive semi-definite on . Let be orthogonal eigenvectors in order of decreasing eigenvalue and let be the orthogonal projection of onto the span of for . (Note that this definition leaves some choice in case not all eigenvalues are distinct since the projections are not unique in that case.) Now the main result states:
For each the linear map minimises the expected square error among all linear maps of rank .
Let’s apply this result to two special cases. For the first case we simply assume that , and . In this case and . So the projections project onto the eigenspaces of the covariance matrix and . This result coincides with principle component analysis for the variable .
For the second case assume that and for some random diagonal matrix where each diagonal entry is independent (also of ) and Bernoulli distributed with probability . The matrix models dropout in the coefficients of . Let be the covariance matrix of , the diagonal of , and . In this case
Now are projections onto eigenspaces of
which is the semi-positive definite matrix that appeared in the previous post about linear encoders with dropout. Finally in this case .