data_mining:neural_network:overfitting

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
data_mining:neural_network:overfitting [2018/05/10 17:50] – [Weight penalites] phreazerdata_mining:neural_network:overfitting [2018/05/10 17:55] – [Inverted dropout] phreazer
Line 86: Line 86:
 Cost function  Cost function 
  
-$J(\dots) = 1/m \sum_{i=1}^n L(\hat{y}^{i}, y^{i}) + \frac{\lambda}{2m} \sum_{l=1}^L ||W^{[l]}||_F^2$+$J(W^{[l]},b^{[l]})= \frac{1}{m\sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||_F^2$
  
 Frobenius Norm: $||W^{[l]}||_F^2$ Frobenius Norm: $||W^{[l]}||_F^2$
Line 97: Line 97:
  
 For large $\lambda$, $W^{[l]} => 0$ For large $\lambda$, $W^{[l]} => 0$
- 
-$J(W^{[l]},b^{[l]})= \frac{1}{m} \sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||^2$ 
  
 This results in a **simpler** network / each hidden unit has **smaller effect**. This results in a **simpler** network / each hidden unit has **smaller effect**.
Line 143: Line 141:
 Layer $l=3$. Layer $l=3$.
  
-$keep.prob = 0.8$+$keep.prob = 0.8$ // probability that unit will be kept
  
 $d3 = np.random.rand(a3.shape[0], a3.shape[i]) < keep.prob$ $d3 = np.random.rand(a3.shape[0], a3.shape[i]) < keep.prob$
  
-$a3 = np.multiply(a3,d3)$+$a3 = np.multiply(a3,d3)$ // activations in layer 3 $a3 *= d3$
  
 $a3 /= keep.prob$ // e.g. 50 units => 10 units shut off $a3 /= keep.prob$ // e.g. 50 units => 10 units shut off
  • data_mining/neural_network/overfitting.txt
  • Last modified: 2018/05/10 18:03
  • by phreazer