====== Deep neural networks ====== Notation $l = 4$ layers $n^{[l]} = \text{# of units in layer } l$ $n^{[0]} = 3$ $n^{[1]} = 5$ $n^{[2]} = 5$ $n^{[3]} = 3$ $n^{[4]} = n^{[l]} = 1$ ===== Forward prop ===== Input: $a^{[l - 1]}$ Output: $a^{[l]}$, cache $(z^{[l]})$ and $W^{[l]}$, $b^{[l]}$ $Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]}$ $A^{[l]} = g^{[l]}(Z^{[l]})$ $A^{[0]}$ is input set. ===== Backward prop for layer l ===== Input: $da^{[l]}$ Output: $da^{[l-1]}, dW^{[l]}, db^{[l]}$ $dZ^{[l]} = dA^{[l]} * g'^{[l]}(Z^{[l]})$ # element-wise product $dW^{[l]} = 1/m * dZ^{[l]} * A^{[l-1]^T}$ $db^{[l]} = 1/m * np.sum(dZ^{[l]}, axis=1, keep.dims=True)$ $dA^{[l-1]} = W^{[l]^T} * dZ^{[l]}$ ===== Flow ===== Forward: X -> ReLU -> ReLU -> Sigmoid -> $\hat{y}$ -> $L(\hat{y}, y)$ Init backprop with derivative of $L$. ===== Matrix dimensions ===== $l=5$ 2-3-5-4-2-1 $Z^1 = W^1 * x + b^1 $ $Z^1 :(3,1)$ $x : (2,1)$ $W^1 :(n^1,n^0) => W^1 (3,2), W^2(5,3)$ $W^l :(n^l, n^{l-1})$ $b^1 : (3,1)$ $b^L : (n^l, 1)$ analog with $dW^l$ and $db^l$ ==== Vectorized ==== $Z^1 : (n^1,m)$ $W^1 :(n^1, n^0)$ $X : (n^0, m)$ $b^1: (n^1,m)$