Differences

This shows you the differences between two versions of the page.

--- data_mining:neural_network:tuning [2018/05/20 15:10] – [Algo] phreazer
+++ data_mining:neural_network:tuning [2018/05/20 15:20] – [Why does it work] phreazer
@@ Line 249: / Line 249: @@
   * Use backprop to compute $dW^{l}, d\beta^{l}, d\gamma^{l}$
   * Update parameters ...
+==== Why does it work ====
+Covariance shift (shifting input distribution)
+  * Batch norm reduces amount in which hidden units shifts around, become more stable (input to later layers)
+  * Slight regularization effect: Adds some noise, because it's normed on the mini batch
+==== Batch norm at test time ====
+Here no mini-batch, but one sample at a time