Least square fitting of vector-valued random variables

Let (x, y) \in \mathbb{R}^n \times \mathbb{R}^m be joint random variables (typically not independent). This post describes which linear map \sigma{: \mathbb{R}^n \to \mathbb{R}^m} relates these variables best in the sense that the expected square error \mathbb{E} \lVert \sigma(x) - y \rVert^2 is minimised. Related applications such as principle component analysis and auto encoding under random distortion (such as drop out) are recovered as special cases.

Vectors will be interpreted as column vectors and transposition is denoted by a superscript asterisk. It is assumed that  X = \mathbb{E}(x x^{\ast}) is non-singular (and therefore positive definite). Define Y = \mathbb{E}(y x^{\ast}). Then Y X^{-1} Y^{\ast} is positive semi-definite on \mathbb{R}^m. Let v_1, \ldots, v_m \in \mathbb{R}^m be orthogonal eigenvectors in order of decreasing eigenvalue and let \pi_k be the orthogonal projection of \mathbb{R}^m onto the span of v_1, \ldots, v_k for k \in \{1, \ldots, m\}. (Note that this definition leaves some choice in case not all eigenvalues are distinct since the projections are not unique in that case.) Now the main result states:

For each k \in \{1, \ldots, m\} the linear map \sigma_k = \pi_k Y X^{-1} minimises the expected square error \mathbb{E} \lVert \sigma_k(x) - y \lVert^2 among all linear maps of rank k.

Let’s apply this result to two special cases. For the first case we simply assume that n = m, x=y and \mathbb{E}(x) = 0. In this case Y = X and Y X^{-1} Y^{\ast} = X. So the projections \pi_k project onto the eigenspaces of the covariance matrix X and \sigma_k = \pi_k Y X^{-1} = \pi_k. This result coincides with principle component analysis for the variable x.

For the second case assume that n = m and x = A y for some random diagonal matrix where each diagonal entry A_{ii} is independent (also of y) and Bernoulli distributed with probability p > 0. The matrix A models dropout in the coefficients of y. Let \Sigma = \mathbb{E}(y y^{\ast}) be the covariance matrix of y, D the diagonal of \Sigma, and q = 1-p. In this case

X = p \left( p \Sigma + q D \right) and Y = p \Sigma.

Now \pi_k are projections onto eigenspaces of

\Sigma \left(p \Sigma + q D \right)^{-1} \Sigma,

which is the semi-positive definite matrix that appeared in the previous post about linear encoders with dropout. Finally in this case \sigma_k = \pi_k \Sigma \left(p \Sigma + q D \right)^{-1}.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s