Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:neural_network:deep_neural_nets [2017/08/20 17:13] – phreazer | data_mining:neural_network:deep_neural_nets [2017/08/20 18:04] (current) – [Vectorized] phreazer | ||
---|---|---|---|
Line 11: | Line 11: | ||
$n^{[3]} = 3$ | $n^{[3]} = 3$ | ||
$n^{[4]} = n^{[l]} = 1$ | $n^{[4]} = n^{[l]} = 1$ | ||
+ | |||
+ | ===== Forward prop ===== | ||
+ | |||
+ | Input: $a^{[l - 1]}$ | ||
+ | |||
+ | Output: $a^{[l]}$, cache $(z^{[l]})$ and $W^{[l]}$, $b^{[l]}$ | ||
+ | |||
+ | $Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]}$ | ||
+ | |||
+ | $A^{[l]} = g^{[l]}(Z^{[l]})$ | ||
+ | |||
+ | $A^{[0]}$ is input set. | ||
+ | |||
+ | ===== Backward prop for layer l ===== | ||
+ | |||
+ | Input: $da^{[l]}$ | ||
+ | |||
+ | Output: $da^{[l-1]}, | ||
+ | |||
+ | |||
+ | $dZ^{[l]} = dA^{[l]} * g' | ||
+ | |||
+ | $dW^{[l]} = 1/m * dZ^{[l]} * A^{[l-1]^T}$ | ||
+ | |||
+ | $db^{[l]} = 1/m * np.sum(dZ^{[l]}, | ||
+ | |||
+ | $dA^{[l-1]} = W^{[l]^T} * dZ^{[l]}$ | ||
+ | |||
+ | ===== Flow ===== | ||
+ | |||
+ | Forward: | ||
+ | |||
+ | X -> ReLU -> ReLU -> Sigmoid -> $\hat{y}$ -> $L(\hat{y}, y)$ | ||
+ | |||
+ | Init backprop with derivative of $L$. | ||
+ | |||
+ | |||
+ | ===== Matrix dimensions ===== | ||
+ | $l=5$ | ||
+ | 2-3-5-4-2-1 | ||
+ | |||
+ | $Z^1 = W^1 * x + b^1 $ | ||
+ | |||
+ | $Z^1 :(3,1)$ | ||
+ | |||
+ | $x : (2,1)$ | ||
+ | |||
+ | $W^1 :(n^1,n^0) => W^1 (3,2), W^2(5,3)$ | ||
+ | |||
+ | $W^l :(n^l, n^{l-1})$ | ||
+ | |||
+ | $b^1 : (3,1)$ | ||
+ | |||
+ | $b^L : (n^l, 1)$ | ||
+ | |||
+ | analog with $dW^l$ and $db^l$ | ||
+ | |||
+ | ==== Vectorized ==== | ||
+ | |||
+ | $Z^1 : (n^1,m)$ | ||
+ | |||
+ | $W^1 :(n^1, n^0)$ | ||
+ | |||
+ | $X : (n^0, m)$ | ||
+ | |||
+ | $b^1: (n^1,m)$ |