data_mining:neural_network:gradient_descent

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:gradient_descent [2018/05/12 22:12] – [Adam] phreazerdata_mining:neural_network:gradient_descent [2018/05/12 22:21] (current) – [Learning rate decay] phreazer
Line 79: Line 79:
   * $\beta_1$: 0.9   * $\beta_1$: 0.9
   * $\beta_2$: 0.999   * $\beta_2$: 0.999
-  * $\sigma$: 10^{-8}+  * $\sigma$: $10^{-8}
 + 
 +===== Learning rate decay ===== 
 + 
 +$\alpha = \frac{1}{1+ \text{decay_rate} * \text{epoch_num}} \alpha_0$ 
 + 
 +or 
 + 
 +$\alpha = 0,95^{\text{epoch_num}} \alpha_0$ 
 + 
 + 
 +===== Saddle points ===== 
 + 
 +In high-dimensional spaces it's more likely to end up at a saddle point (than in local optima). E.g. 20000 parameter, highly unlikely that it's a local minimum you get stuck. Plateus make learning slow.
  • data_mining/neural_network/gradient_descent.1526155946.txt.gz
  • Last modified: 2018/05/12 22:12
  • by phreazer