data_mining:neural_network:overfitting

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
data_mining:neural_network:overfitting [2017/08/19 23:15] – [Dropout] phreazerdata_mining:neural_network:overfitting [2018/05/10 17:52] – [Weight penalites] phreazer
Line 80: Line 80:
 $w$ will be sparse. $w$ will be sparse.
  
-Use hold-out test set to set hyperparameter.+Use **hold-out** test set to set hyperparameter.
  
 == Neural Network == == Neural Network ==
Line 86: Line 86:
 Cost function  Cost function 
  
-$J(\dots) = 1/m \sum_{i=1}^n L(\hat{y}^{i}, y^{i}) + \frac{\lambda}{2m} \sum_{l=1}^L ||W^{[l]}||_F^2$+$J(W^{[l]},b^{[l]})= \frac{1}{m\sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||_F^2$
  
 Frobenius Norm: $||W^{[l]}||_F^2$ Frobenius Norm: $||W^{[l]}||_F^2$
Line 96: Line 96:
 Called **weight decay** (additional multiplication with weights) Called **weight decay** (additional multiplication with weights)
  
-Large $\lambda$: Every layer ~ linear; z small range of values (in case of tanh activation fct)+For large $\lambda$, $W^{[l]} => 0$
  
 +This results in a **simpler** network / each hidden unit has **smaller effect**.
 +
 +When $W$ is small, $z$ has a smaller range, resulting activation e.g. for tanh is more linear.
  
 ==== Weights constraints ==== ==== Weights constraints ====
  • data_mining/neural_network/overfitting.txt
  • Last modified: 2018/05/10 18:03
  • by phreazer