data_mining:neural_network:autoencoder

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:autoencoder [2017/05/04 15:14] – [Deep autoencoders] phreazerdata_mining:neural_network:autoencoder [2017/07/30 18:02] (current) – [Autoencoder] phreazer
Line 1: Line 1:
 ====== Autoencoder ====== ====== Autoencoder ======
 +
 +  * Unsupervised learning: Feature extraction, Generative models, Compression, Data reduction
 +  * Loss as evaluation metric
 +  * Difference to RBM: Deterministic approach (not stochastic).
 +  * Encoder compresses to few dimensions, Decoder maps back to full dimensionality
 +  * Building block for deep belief networks
 +===== Comparison with PCA =====
 +
  
 PCA:  PCA: 
Line 21: Line 29:
  
 ===== Deep autoencoders ===== ===== Deep autoencoders =====
 +
 Looked like nice way to do non-linear dimensionality reduction:  Looked like nice way to do non-linear dimensionality reduction: 
   * Encoding model compact and fast;   * Encoding model compact and fast;
Line 34: Line 43:
 W_1 -> W_2 -> W_3 -> W_4 -> 30 linear units -> W^T_4 -> W^T_3 -> W^T_2 -> W^T_1 W_1 -> W_2 -> W_3 -> W_4 -> 30 linear units -> W^T_4 -> W^T_3 -> W^T_2 -> W^T_1
  
 +===== Deep autoencoders for doc retrieval =====
 +Convert each doc into "bag of words" (ignore stop words)
 +Reduce each query vector using a deep autoencoder.
 +
 +Input vec 2000 counts => Output vec 2000 reconstructed counts.
 +
 +Divide counts in a bag of words vector by N, where N is the total number of non-stop works in the document. Output of autoencoder 2000 softmax. 
 +
 +When training the first RBM in the stack: Treat word counts as probs, but make visible to hidden weights N times bigger than hidden to visible, because we have N obersvations from the prop distr.
 +
 +===== Semantic hashing =====
 +
 +Convert doc in memory address. Find similar docs in nearby addresses.
 +
 +Autoencoder 30 logistic units in code layer
 +During fine-tuning add noise to inputs to the code units.
 +  * Noise forces activities to become bimodal in order to resist the effects of the noise.
 +  * Simply threshold activities of 30 code units to get binary code.
 +
 +Learn binary features for representation.
 +
 +Deep-autoencoder as hash function. 
 +
 +Query (supermaket search): Hash, get address, get nearby addresses (semantically similar documents).
 +
 +===== Learn binary codes for image retrieval =====
 +
 +Matching real-values vectors is slow => short binary code faster.
 +
 +Use semantic hashing with 28-bit binary code to get a long shortlist of promising images. The nuser 265 bit binary code to do a serial search for good matches.
 +
 +Krizhevsky's deep autoencoder 8192 => 4096 => ... => 256-bit binary code. (architecture is just a guess).
 +
 +Reconstructing 32x32 color images from 256 bit codes.
 +
 +===== Shallow autoencoders for pre-training =====
 +
 +Just have 1 layer. RBMs can be seen as shallow autoencoders.
 +
 +Train RBM with one-step constrastive divergence: Makses resconstruction look like data.
 +
 +
 +===== Conclusion about pre-training =====
  
 +For data sets without huge number of labeled cases: Pre-training helps subsequent discriminative learning, espescially if unlabeled extra data is available.
  
 +For very large, labeled datasets: Not necessary, but if nets get much larger pre-training is necessary again.
  • data_mining/neural_network/autoencoder.1493903657.txt.gz
  • Last modified: 2017/05/04 15:14
  • by phreazer