data_mining:neural_network:debugging

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
data_mining:neural_network:debugging [2017/08/20 00:02] – angelegt phreazerdata_mining:neural_network:debugging [2017/08/20 00:12] (current) phreazer
Line 1: Line 1:
 ====== Gradient checking ====== ====== Gradient checking ======
  
-$\frac{f(\Theta + \epsilon) - f(\Theta - \epsilon)}{2 \epsilon}$+Approximating derivatives: 
 + 
 +(Large triangle (+/- triangle, two-sided difference)) 
 + 
 +$\frac{f(\Theta + \epsilon) - f(\Theta - \epsilon)}{2 \epsilon} \approx g(\Theta)$ 
 + 
 +$f'(\Theta) = \lim_{\epsilon->0} \frac{f(\Theta + \epsilon) - f(\Theta - \epsilon)}{2 \epsilon}$ 
 + 
 +Approx error is in $O(\epsilon^2)$ 
 + 
 +Take $W^{[1]}, b^{[1]}, \dots, W^{[L]},b^{[L]}$ and put it in a big vector $\theta$. 
 + 
 +Take $dW^{[1]}, db^{[1]}, \dots, dW^{[L]},db^{[L]}$ and put it in a big vector $d\theta$. 
 + 
 +J is now $J(\Theta) = J(\Theta_1, ...)$ 
 + 
 +For each i: 
 + 
 +$d\Theta_{approx}[i] = \frac{J(\dots, \Theta_i+\epsilon,\dots) - J(\dots, \Theta_i-\epsilon,\dots)}{2\epsilon}$ 
 + 
 +$\epsilon = 10^{-7}$
  • data_mining/neural_network/debugging.1503180129.txt.gz
  • Last modified: 2017/08/20 00:02
  • by phreazer