We begin with a brief review of least squares fitting formulated in autoencoder language. Let be a random variable in such that and let be its covariance matrix. Then is a self-adjoint (symmetric) operator. Let be a positive number not greater than . A linear autoencoder for is a pair of linear operators and such that

- The operator is unitary: on .
- The expected square error is minimal among pairs .

The second requirement is interpreted as “ reduces the dimension of the variable from to with a minimal loss of information”. The function

has a saddle point exactly when

- The image of in is invariant under .
- .

Note that is an orthogonal projection in this case. The expected error among such saddle points is minimal if the image of is a direct sum of eigenspaces of with the largest possible eigenvalues.

The variable is in practical situation a finite sampled set of observations each occurring with equal probability. Doing a least square error fit on as above has the risk of being oversensitive to features that are only apparent in this specific sample. In other words it may be sensitive to outliers. One way to reduce this sensitivity is to introduce *dropout*.

A standard form of dropout is the following. Let be a Bernoulli random variable with expectation and a random operator with mutually independent coefficients (so for all indices ). This matrix is also taken to be independent of . Now in dropout the operator is replaced by the Hadamard product . This means that each coefficient of can “drop out” independently with probability . A linear autoencoder *with dropout* is a pair of operators similar as above but now minimises the altered expected error

.

Here the expectation is for the joint distribution of the independent pair . The idea is that must now be robust against random dropout and that this prevents it from being oversensitive to accidental features in . Also in this dropout case the saddle points of the function

can be described explicitly. Let denote the diagonal operator with the same diagonal entries as . The pair is a saddle point if

- The image of is invariant under .
- .

Indeed for (probability of dropout is zero) this reduces to the criterion above for a linear autoencoder without dropout.

Pingback: Least square fitting of random variables | Wim Couwenberg's Math Blog