Show pageOld revisionsBacklinksBack to top This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====== Gradient checking ====== Approximating derivatives: (Large triangle (+/- triangle, two-sided difference)) $\frac{f(\Theta + \epsilon) - f(\Theta - \epsilon)}{2 \epsilon} \approx g(\Theta)$ $f'(\Theta) = \lim_{\epsilon->0} \frac{f(\Theta + \epsilon) - f(\Theta - \epsilon)}{2 \epsilon}$ Approx error is in $O(\epsilon^2)$ Take $W^{[1]}, b^{[1]}, \dots, W^{[L]},b^{[L]}$ and put it in a big vector $\theta$. Take $dW^{[1]}, db^{[1]}, \dots, dW^{[L]},db^{[L]}$ and put it in a big vector $d\theta$. J is now $J(\Theta) = J(\Theta_1, ...)$ For each i: $d\Theta_{approx}[i] = \frac{J(\dots, \Theta_i+\epsilon,\dots) - J(\dots, \Theta_i-\epsilon,\dots)}{2\epsilon}$ $\epsilon = 10^{-7}$ data_mining/neural_network/debugging.txt Last modified: 2017/08/19 22:12by phreazer