Deep neural networks
Notation
$l = 4$ layers $n^{[l]} = \text{# of units in layer } l$
$n^{[0]} = 3$ $n^{[1]} = 5$ $n^{[2]} = 5$ $n^{[3]} = 3$ $n^{[4]} = n^{[l]} = 1$
Forward prop
Input: $a^{[l - 1]}$
Output: $a^{[l]}$, cache $(z^{[l]})$ and $W^{[l]}$, $b^{[l]}$
$Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]}$
$A^{[l]} = g^{[l]}(Z^{[l]})$
$A^{[0]}$ is input set.
Backward prop for layer l
Input: $da^{[l]}$
Output: $da^{[l-1]}, dW^{[l]}, db^{[l]}$
$dZ^{[l]} = dA^{[l]} * g'^{[l]}(Z^{[l]})$ # element-wise product
$dW^{[l]} = 1/m * dZ^{[l]} * A^{[l-1]^T}$
$db^{[l]} = 1/m * np.sum(dZ^{[l]}, axis=1, keep.dims=True)$
$dA^{[l-1]} = W^{[l]^T} * dZ^{[l]}$
Flow
Forward:
X → ReLU → ReLU → Sigmoid → $\hat{y}$ → $L(\hat{y}, y)$
Init backprop with derivative of $L$.
Matrix dimensions
$l=5$ 2-3-5-4-2-1
$Z^1 = W^1 * x + b^1 $
$Z^1 :(3,1)$
$x : (2,1)$
$W^1 :(n^1,n^0) => W^1 (3,2), W^2(5,3)$
$W^l :(n^l, n^{l-1})$
$b^1 : (3,1)$
$b^L : (n^l, 1)$
analog with $dW^l$ and $db^l$
Vectorized
$Z^1 : (n^1,m)$
$W^1 :(n^1, n^0)$
$X : (n^0, m)$
$b^1: (n^1,m)$