====== NN initialization ====== ===== Random initialization ===== Weights need to be randomly initialized. For bias zero is ok. If weights are zero: in backprop => $dz_{1,2}$ are the same. Hidden units would compute same function (= are symmetric). Solution: $W^{[i]}=np.random.randn((2,2)) * 0.01$ $0.01$ because else we would end up at ends of activation function values (and slopes would be small), e.g. if values would be large.