Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
data_mining:neural_network:initialization [2017/08/19 20:28] – angelegt phreazer | data_mining:neural_network:initialization [2017/08/19 20:38] (current) – [Random initialization] phreazer | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== NN initialization ====== | ====== NN initialization ====== | ||
+ | ===== Random initialization ===== | ||
+ | Weights need to be randomly initialized. For bias zero is ok. | ||
+ | |||
+ | If weights are zero: in backprop => $dz_{1,2}$ are the same. Hidden units would compute same function (= are symmetric). | ||
+ | |||
+ | Solution: $W^{[i]}=np.random.randn((2, | ||
+ | |||
+ | $0.01$ because else we would end up at ends of activation function values (and slopes would be small), e.g. if values would be large. |