data_mining:neural_network:tuning

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:tuning [2018/05/20 15:09] – [Batch normalization] phreazerdata_mining:neural_network:tuning [2018/05/20 15:21] (current) – [Batch norm at test time] phreazer
Line 244: Line 244:
 ==== Algo ==== ==== Algo ====
  
-For minibatch t: +  * For minibatch t: 
-  Compute forward prop for $X^{\t{}}$ +    Compute forward prop for $X^{\{t\}}$ 
-    In each hidden layer use BN to replace $Z^l$ with $\tilde{Z}^l$ +      In each hidden layer use BN to replace $Z^l$ with $\tilde{Z}^l$ 
-  Use backprop to compute $dW^{l}, d\beta^{l}, d\gamma^{l}$ +  Use backprop to compute $dW^{l}, d\beta^{l}, d\gamma^{l}$ 
-  Update parameters ...+  Update parameters ...
  
 +==== Why does it work ====
 +
 +Covariance shift (shifting input distribution)
 +
 +  * Batch norm reduces amount in which hidden units shifts around, become more stable (input to later layers)
 +  * Slight regularization effect: Adds some noise, because it's normed on the mini batch
 +
 +==== Batch norm at test time ====
 +
 +Here no mini-batch, but one sample at a time
 +
 +Estimate $\sigma^2, \mu$ using exponentially weighted average across mini-batches
  • data_mining/neural_network/tuning.1526821774.txt.gz
  • Last modified: 2018/05/20 15:09
  • by phreazer