data_mining:neural_network:autoencoder

This is an old revision of the document!


Autoencoder

PCA:

  • N dimensions, M orthogonal directions with most variance.
  • Reconstruct by using mean value over all the data on the N-M directions that are not represented.

USe backprop to implement PCA inefficiently.

  • M hidden units as bottleneck

INPUT vector ⇒ Code ⇒ OUTPUT vector

Activities in the hidden units form an efficient code.

If hidden and output layers are linear, it weill learn hidden units that are line function of data and minimize squared reconstruciton error (like PCA). M hidden units will span same space as the first M components of PCA, but weights vectors may not be orthogonal and will have equal variances.

Allows generalization of PCA.

With non-linear layers before and after the code, it should be possible to efficiently represent data that lies on or near a non-linear manifold.

input vector ⇒ encoder ⇒ encode weights ⇒ decoding weights ⇒ output vector

Looked like nice way to do non-linear dimensionality reduction:

  • Encoding model compact and fast;
  • learning time is linear in the number of training cases.

But very difficult to optimize deep autoencoders using backprop: Small initial weights ⇒ backprop gradient dies.

Optimize with: unsupervised layer-by-layer pre-training. Or initialize weights carefully as in Echo-State nets.

Stack of 4 RBMs, then unroll them. Fine-tune with gentle backprop.

784 → 1000 → 500 → 250 → 30 linear models → 250 → 500 → 1000 → 784. W_1 → W_2 → W_3 → W_4 → 30 linear units → W^T_4 → W^T_3 → W^T_2 → W^T_1

  • data_mining/neural_network/autoencoder.1493903657.txt.gz
  • Last modified: 2017/05/04 15:14
  • by phreazer