Differences

This shows you the differences between two versions of the page.

--- data_mining:neural_network:autoencoder [2017/05/04 15:30] – [Deep autoencoders for doc retrieval] phreazer
+++ data_mining:neural_network:autoencoder [2017/07/30 18:02] (current) – [Autoencoder] phreazer
@@ Line 1: / Line 1: @@
 ====== Autoencoder ======
+  * Unsupervised learning: Feature extraction, Generative models, Compression, Data reduction
+  * Loss as evaluation metric
+  * Difference to RBM: Deterministic approach (not stochastic).
+  * Encoder compresses to few dimensions, Decoder maps back to full dimensionality
+  * Building block for deep belief networks
+===== Comparison with PCA =====
 PCA:
@@ Line 48: / Line 56: @@
 Convert doc in memory address. Find similar docs in nearby addresses.
+Autoencoder 30 logistic units in code layer
+During fine-tuning add noise to inputs to the code units.
+  * Noise forces activities to become bimodal in order to resist the effects of the noise.
+  * Simply threshold activities of 30 code units to get binary code.
+Learn binary features for representation.
+Deep-autoencoder as hash function.
+Query (supermaket search): Hash, get address, get nearby addresses (semantically similar documents).
+===== Learn binary codes for image retrieval =====
+Matching real-values vectors is slow => short binary code faster.
+Use semantic hashing with 28-bit binary code to get a long shortlist of promising images. The nuser 265 bit binary code to do a serial search for good matches.
+Krizhevsky's deep autoencoder 8192 => 4096 => ... => 256-bit binary code. (architecture is just a guess).
+Reconstructing 32x32 color images from 256 bit codes.
+===== Shallow autoencoders for pre-training =====
+Just have 1 layer. RBMs can be seen as shallow autoencoders.
+Train RBM with one-step constrastive divergence: Makses resconstruction look like data.
+===== Conclusion about pre-training =====
+For data sets without huge number of labeled cases: Pre-training helps subsequent discriminative learning, espescially if unlabeled extra data is available.
+For very large, labeled datasets: Not necessary, but if nets get much larger pre-training is necessary again.