====== NN initialization ======

===== Random initialization =====
Weights need to be randomly initialized. For bias zero is ok.

If weights are zero: in backprop => $dz_{1,2}$ are the same. Hidden units would compute same function (= are symmetric).

Solution: $W^{[i]}=np.random.randn((2,2)) * 0.01$

$0.01$ because else we would end up at ends of activation function values (and slopes would be small), e.g. if values would be large.