Differences

This shows you the differences between two versions of the page.

--- data_mining:neural_network:debugging [2017/08/20 00:05] – phreazer
+++ data_mining:neural_network:debugging [2017/08/20 00:12] (current) – phreazer
@@ Line 3: / Line 3: @@
 Approximating derivatives:
-(Large triangle (+/- triangle))
+(Large triangle (+/- triangle, two-sided difference))
-$\frac{f(\Theta + \epsilon) - f(\Theta - \epsilon)}{2 \epsilon} ~ g(\Theta)$
+$\frac{f(\Theta + \epsilon) - f(\Theta - \epsilon)}{2 \epsilon} \approx g(\Theta)$
 $f'(\Theta) = \lim_{\epsilon->0} \frac{f(\Theta + \epsilon) - f(\Theta - \epsilon)}{2 \epsilon}$
 Approx error is in $O(\epsilon^2)$
+Take $W^{[1]}, b^{[1]}, \dots, W^{[L]},b^{[L]}$ and put it in a big vector $\theta$.
+Take $dW^{[1]}, db^{[1]}, \dots, dW^{[L]},db^{[L]}$ and put it in a big vector $d\theta$.
+J is now $J(\Theta) = J(\Theta_1, ...)$
+For each i:
+$d\Theta_{approx}[i] = \frac{J(\dots, \Theta_i+\epsilon,\dots) - J(\dots, \Theta_i-\epsilon,\dots)}{2\epsilon}$
+$\epsilon = 10^{-7}$