Differences

This shows you the differences between two versions of the page.

--- data_mining:neural_network:gradient_descent [2018/05/12 22:16] – [Adam] phreazer
+++ data_mining:neural_network:gradient_descent [2018/05/12 22:21] (current) – [Learning rate decay] phreazer
@@ Line 83: / Line 83: @@
 ===== Learning rate decay =====
-$\alpha = \frac{1}{1+ decay_rate * epoch_num} \alpha_0$
+$\alpha = \frac{1}{1+ \text{decay_rate} * \text{epoch_num}} \alpha_0$
 or
-$\alpha = 0,95^{epoch_num} \alpha_0$
+$\alpha = 0,95^{\text{epoch_num}} \alpha_0$
+===== Saddle points =====
+In high-dimensional spaces it's more likely to end up at a saddle point (than in local optima). E.g. 20000 parameter, highly unlikely that it's a local minimum you get stuck. Plateus make learning slow.