NN initialization
Random initialization
Weights need to be randomly initialized. For bias zero is ok.
If weights are zero: in backprop ⇒ $dz_{1,2}$ are the same. Hidden units would compute same function (= are symmetric).
Solution: $W^{[i]}=np.random.randn((2,2)) * 0.01$
$0.01$ because else we would end up at ends of activation function values (and slopes would be small), e.g. if values would be large.