data_mining:neural_network:model_combination

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:model_combination [2017/08/19 21:56] – [Inverted dropout] phreazerdata_mining:neural_network:model_combination [2017/08/19 22:12] (current) – [Approximating full Bayesian learning in a NN] phreazer
Line 85: Line 85:
 More complicated and effective methods than MCMC method: Don't need to wander the space long. More complicated and effective methods than MCMC method: Don't need to wander the space long.
  
-If we compute gradient of cost function on a **random mini-batch**, we will get an ubiased estimate with sampling noise.+If we compute gradient of cost function on a **random mini-batch**, we will get an unbiased estimate with sampling noise.
  
 ====== Dropout ====== ====== Dropout ======
-Ways to combine output of multiple models: +See [[data_mining:neural_network:regularization|Regularization]]
-  * MIXTURECombine models by averaging their output probabilities. +
-  * PRODUCT: by geometric mean (typically less than one) $\sqrt{x*y}/ \sum$ +
- +
-NN with one hidden layer. +
-Randomly omit each hidden unit with probability 0.5, for each training sample. +
-Randomly sampling from 2^H architextures. +
- +
-Sampling form 2^H models, and each model only gets one training example (extreme bagging) +
-Sharing of the weights means that every model is very strongly regularized. +
- +
-What to do at test time? +
- +
-Use all hidden units, but halve their outgoing weights. This exactly computes the geometric mean of the predictions of all 2^H models. +
- +
-What if we have more hidden Layers? +
- +
-* Use dropout of 0.5 in every layer. +
-* At test time, use mean net, that has all outgoing weights halved. Not the same, as averaging all separate dropped out models, but approximation. +
- +
-Dropout prevents overfitting. +
- +
-For each training example: For each node toss a coin, e.g. with prob 0.5 and eleminate nodes. +
- +
-===== Inverted dropout ===== +
- +
-Layer $l=3$. +
- +
-$keep.prob = 0.8$ +
- +
-$d3 = np.random.rand(a3.shape[0], a3.shape[i]) < keep.prob$ +
- +
-$a3 = np.multiply(a3,d3)$ +
- +
-$a3 /= keep.prob$ // e.g. 50 units => 10 units shut off +
- +
-$Z = Wa+b$ // reduced by 20% => standardize with 0.8 => expected value stays the same +
- +
-Making predictions at test time: No drop out +
  
  • data_mining/neural_network/model_combination.1503172601.txt.gz
  • Last modified: 2017/08/19 21:56
  • by phreazer