Differences

This shows you the differences between two versions of the page.

--- data_mining:neural_network:overfitting [2018/05/10 17:50] – [Weight penalites] phreazer
+++ data_mining:neural_network:overfitting [2018/05/10 17:52] – [Weight penalites] phreazer
@@ Line 86: / Line 86: @@
 Cost function
-$J(\dots) = 1/m \sum_{i=1}^n L(\hat{y}^{i}, y^{i}) + \frac{\lambda}{2m} \sum_{l=1}^L ||W^{[l]}||_F^2$
+$J(W^{[l]},b^{[l]})= \frac{1}{m} \sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||_F^2$
 Frobenius Norm: $||W^{[l]}||_F^2$
@@ Line 97: / Line 97: @@
 For large $\lambda$, $W^{[l]} => 0$
-$J(W^{[l]},b^{[l]})= \frac{1}{m} \sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||^2$
 This results in a **simpler** network / each hidden unit has **smaller effect**.