data_mining:neural_network:belief_nets

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:belief_nets [2017/04/29 10:41] – [Learning layers of features by stacking RBMs] phreazerdata_mining:neural_network:belief_nets [2017/07/30 16:05] (current) – [Structure] phreazer
Line 91: Line 91:
   * Aggregated posterior p(1,1,0,0) = 0.215   * Aggregated posterior p(1,1,0,0) = 0.215
       * Factorial would be p = 0.5^4       * Factorial would be p = 0.5^4
 +
 +
 +==== Why does learning work? ====
 +
 +Weights of bottom level RBM:
 +p(v|h); p(h|v); p(v,h); p(v); p(h);
 +
 +Can express RBM model as $p(v) = \sum_h p(h) p(v|h)$
 +
 +If leave $p(v|h)$ alon, but improve $p(h)$, we improve $p(v)$. To improve $p(h)$ we need it be a **better model than $p(h;W)$** of the **aggregated** posterior distr. over hidden vectors produced by applying W transposed to the data.
 +
 +
 +==== Constrastive version of wake-sleep algorithm ====
 +
 +
 +==== Discriminative fine-tuning for DBNs ====
 +
 +  * First learn one layer at a time by stacking RBMs.
 +  * Use this pre-training (found initial weights), which can be fine-tuned by a local search procedure.
 +
 +  * Perviously: Constrastive wake-sleep fine-tuning the model to be better at generation.
 +  * Now: Use Backprop to fine-tune the model to be better at discrimination.
 +
 +
 +Backprop works better with greedy pre-training:
 +* Works wll ans scales to big networks, esp. when we have locality in each layer.
 +* We do not start backpropagation until we have sensible feature detectors.
 +    * Initial gradients are sensibel, backprop only needs to perform a local search from a sensible start point.
 +
 +Fine-tuning only modifies features slightly to get category boundaries right (does not need to discover new features).
 +
 +Objection: Many features are learned that are useless for a particular discrimination.
 +
 +Example model (MNIST): Add 10-way softmax at the top and do backprop.
 +
 +
 +More layers => lower error with pretraining.
 +
 +Solutions are qualitative different.
 +
 +==== Model real-valued data with RBMS ====
 +
 +Mean-field logistic units cannot represent precise inetermediate values (e.g. pixel intensity in image).
 +
 +Model pixels as Gaussian variables. Alternating Gibbs sampling, with lower learning rate.
 +
 +Parabolic containment function. (keep visible unit close to b_i).
 +Energy-gradient.
 +
 +Stepped sigmoid units. Many copies of a stochastic binary unist. All copies have same weiths and bias, b, but they have different fixed offsets to the bias (b-0.5, b-1.5, ...).
 +
 +==== Structure ====
 +
 +Autoencoder, then feed forward NN
 +
  • data_mining/neural_network/belief_nets.1493462483.txt.gz
  • Last modified: 2017/04/29 10:41
  • by phreazer