data_mining:neural_network:tuning

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
data_mining:neural_network:tuning [2018/05/20 15:10] – [Algo] phreazerdata_mining:neural_network:tuning [2018/05/20 15:20] – [Why does it work] phreazer
Line 249: Line 249:
   * Use backprop to compute $dW^{l}, d\beta^{l}, d\gamma^{l}$   * Use backprop to compute $dW^{l}, d\beta^{l}, d\gamma^{l}$
   * Update parameters ...   * Update parameters ...
 +
 +==== Why does it work ====
 +
 +Covariance shift (shifting input distribution)
 +
 +  * Batch norm reduces amount in which hidden units shifts around, become more stable (input to later layers)
 +  * Slight regularization effect: Adds some noise, because it's normed on the mini batch
 +
 +==== Batch norm at test time ====
 +
 +Here no mini-batch, but one sample at a time
 +
  
  • data_mining/neural_network/tuning.txt
  • Last modified: 2018/05/20 15:21
  • by phreazer