Table of Contents

Deep neural networks

Notation

$l = 4$ layers $n^{[l]} = \text{# of units in layer } l$

$n^{[0]} = 3$ $n^{[1]} = 5$ $n^{[2]} = 5$ $n^{[3]} = 3$ $n^{[4]} = n^{[l]} = 1$

Forward prop

Input: $a^{[l - 1]}$

Output: $a^{[l]}$, cache $(z^{[l]})$ and $W^{[l]}$, $b^{[l]}$

$Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]}$

$A^{[l]} = g^{[l]}(Z^{[l]})$

$A^{[0]}$ is input set.

Backward prop for layer l

Input: $da^{[l]}$

Output: $da^{[l-1]}, dW^{[l]}, db^{[l]}$

$dZ^{[l]} = dA^{[l]} * g'^{[l]}(Z^{[l]})$ # element-wise product

$dW^{[l]} = 1/m * dZ^{[l]} * A^{[l-1]^T}$

$db^{[l]} = 1/m * np.sum(dZ^{[l]}, axis=1, keep.dims=True)$

$dA^{[l-1]} = W^{[l]^T} * dZ^{[l]}$

Flow

Forward:

X → ReLU → ReLU → Sigmoid → $\hat{y}$ → $L(\hat{y}, y)$

Init backprop with derivative of $L$.

Matrix dimensions

$l=5$ 2-3-5-4-2-1

$Z^1 = W^1 * x + b^1 $

$Z^1 :(3,1)$

$x : (2,1)$

$W^1 :(n^1,n^0) => W^1 (3,2), W^2(5,3)$

$W^l :(n^l, n^{l-1})$

$b^1 : (3,1)$

$b^L : (n^l, 1)$

analog with $dW^l$ and $db^l$

Vectorized

$Z^1 : (n^1,m)$

$W^1 :(n^1, n^0)$

$X : (n^0, m)$

$b^1: (n^1,m)$