data_mining:neural_network:overfitting

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
data_mining:neural_network:overfitting [2017/08/19 22:24] – [Overfitting & Parameter tuning] phreazerdata_mining:neural_network:overfitting [2018/05/10 17:55] – [Inverted dropout] phreazer
Line 80: Line 80:
 $w$ will be sparse. $w$ will be sparse.
  
-Use hold-out test set to set hyperparameter.+Use **hold-out** test set to set hyperparameter.
  
 == Neural Network == == Neural Network ==
Line 86: Line 86:
 Cost function  Cost function 
  
-$J(\dots) = 1/m \sum_{i=1}^n L(\hat{y}^{i}, y^{i}) + \frac{\lambda}{2m} \sum_{l=1}^L ||W^{[l]}||_F^2$+$J(W^{[l]},b^{[l]})= \frac{1}{m\sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||_F^2$
  
 Frobenius Norm: $||W^{[l]}||_F^2$ Frobenius Norm: $||W^{[l]}||_F^2$
Line 96: Line 96:
 Called **weight decay** (additional multiplication with weights) Called **weight decay** (additional multiplication with weights)
  
-Large $\lambda$: Every layer ~ linear; z small range of values (in case of tanh activation fct)+For large $\lambda$, $W^{[l]} => 0$
  
 +This results in a **simpler** network / each hidden unit has **smaller effect**.
 +
 +When $W$ is small, $z$ has a smaller range, resulting activation e.g. for tanh is more linear.
  
 ==== Weights constraints ==== ==== Weights constraints ====
Line 132: Line 135:
 Dropout prevents overfitting. Dropout prevents overfitting.
  
-For each training example: For each node toss a coin, e.g. with prob 0.5 and eleminate nodes.+For each iteration: For each node toss a coin, e.g. with prob 0.5 and eleminate nodes.
  
 ==== Inverted dropout ==== ==== Inverted dropout ====
Line 138: Line 141:
 Layer $l=3$. Layer $l=3$.
  
-$keep.prob = 0.8$+$keep.prob = 0.8$ // probability that unit will be kept
  
 $d3 = np.random.rand(a3.shape[0], a3.shape[i]) < keep.prob$ $d3 = np.random.rand(a3.shape[0], a3.shape[i]) < keep.prob$
  
-$a3 = np.multiply(a3,d3)$+$a3 = np.multiply(a3,d3)$ // activations in layer 3 $a3 *= d3$
  
 $a3 /= keep.prob$ // e.g. 50 units => 10 units shut off $a3 /= keep.prob$ // e.g. 50 units => 10 units shut off
  • data_mining/neural_network/overfitting.txt
  • Last modified: 2018/05/10 18:03
  • by phreazer