This is an old revision of the document!
Gradient checking
Approximating derivatives:
(Large triangle (+/- triangle, two-sided difference))
$\frac{f(\Theta + \epsilon) - f(\Theta - \epsilon)}{2 \epsilon} \approx g(\Theta)$
$f'(\Theta) = \lim_{\epsilon->0} \frac{f(\Theta + \epsilon) - f(\Theta - \epsilon)}{2 \epsilon}$
Approx error is in $O(\epsilon^2)$
Take $W^{[1]}, b^{[1]}, \dots, W^{[L]},b^{[L]}$ and put it in a big vector $\theta$.
Take $dW^{[1]}, db^{[1]}, \dots, dW^{[L]},db^{[L]}$ and put it in a big vector $d\theta$.
J is now $J(\Theta) = J(\Theta_1, ...)$
For each i:
$d\Theta_{approx}[i] = (J(\dots, \Theta_i+\epsilon,\dots) - J(\dots, \Theta_i-\epsilon,\dots)) / 2\epsilon$