Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
data_mining:neural_network:debugging [2017/08/19 22:02] – angelegt phreazer | data_mining:neural_network:debugging [2017/08/19 22:12] (current) – phreazer | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Gradient checking ====== | ====== Gradient checking ====== | ||
- | $\frac{f(\Theta + \epsilon) - f(\Theta - \epsilon)}{2 \epsilon}$ | + | Approximating derivatives: |
+ | |||
+ | (Large triangle (+/- triangle, two-sided difference)) | ||
+ | |||
+ | $\frac{f(\Theta + \epsilon) - f(\Theta - \epsilon)}{2 \epsilon} \approx g(\Theta)$ | ||
+ | |||
+ | $f' | ||
+ | |||
+ | Approx error is in $O(\epsilon^2)$ | ||
+ | |||
+ | Take $W^{[1]}, b^{[1]}, \dots, W^{[L]}, | ||
+ | |||
+ | Take $dW^{[1]}, db^{[1]}, \dots, dW^{[L]}, | ||
+ | |||
+ | J is now $J(\Theta) = J(\Theta_1, ...)$ | ||
+ | |||
+ | For each i: | ||
+ | |||
+ | $d\Theta_{approx}[i] = \frac{J(\dots, | ||
+ | |||
+ | $\epsilon = 10^{-7}$ |