data_mining:neural_network:overfitting

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
data_mining:neural_network:overfitting [2018/05/10 17:50] – [Weight penalites] phreazerdata_mining:neural_network:overfitting [2018/05/10 17:52] – [Weight penalites] phreazer
Line 86: Line 86:
 Cost function  Cost function 
  
-$J(\dots) = 1/m \sum_{i=1}^n L(\hat{y}^{i}, y^{i}) + \frac{\lambda}{2m} \sum_{l=1}^L ||W^{[l]}||_F^2$+$J(W^{[l]},b^{[l]})= \frac{1}{m\sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||_F^2$
  
 Frobenius Norm: $||W^{[l]}||_F^2$ Frobenius Norm: $||W^{[l]}||_F^2$
Line 97: Line 97:
  
 For large $\lambda$, $W^{[l]} => 0$ For large $\lambda$, $W^{[l]} => 0$
- 
-$J(W^{[l]},b^{[l]})= \frac{1}{m} \sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||^2$ 
  
 This results in a **simpler** network / each hidden unit has **smaller effect**. This results in a **simpler** network / each hidden unit has **smaller effect**.
  • data_mining/neural_network/overfitting.txt
  • Last modified: 2018/05/10 18:03
  • by phreazer