data_mining:neural_network:overfitting

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
data_mining:neural_network:overfitting [2017/08/19 22:23] – [Early stopping] phreazerdata_mining:neural_network:overfitting [2018/05/10 17:55] – [Inverted dropout] phreazer
Line 21: Line 21:
   * Dropout (Randomly ommit hidden units)   * Dropout (Randomly ommit hidden units)
   * Generative pre-training)   * Generative pre-training)
 +
 +Solves Variance problems (See [[data_mining:error_analysis|Error Analysis]])
  
 ====== Capacity control ====== ====== Capacity control ======
Line 78: Line 80:
 $w$ will be sparse. $w$ will be sparse.
  
-Use hold-out test set to set hyperparameter.+Use **hold-out** test set to set hyperparameter.
  
 == Neural Network == == Neural Network ==
Line 84: Line 86:
 Cost function  Cost function 
  
-$J(\dots) = 1/m \sum_{i=1}^n L(\hat{y}^{i}, y^{i}) + \frac{\lambda}{2m} \sum_{l=1}^L ||W^{[l]}||_F^2$+$J(W^{[l]},b^{[l]})= \frac{1}{m\sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||_F^2$
  
 Frobenius Norm: $||W^{[l]}||_F^2$ Frobenius Norm: $||W^{[l]}||_F^2$
Line 94: Line 96:
 Called **weight decay** (additional multiplication with weights) Called **weight decay** (additional multiplication with weights)
  
-Large $\lambda$: Every layer ~ linear; z small range of values (in case of tanh activation fct)+For large $\lambda$, $W^{[l]} => 0$ 
 + 
 +This results in a **simpler** network / each hidden unit has **smaller effect**.
  
 +When $W$ is small, $z$ has a smaller range, resulting activation e.g. for tanh is more linear.
  
 ==== Weights constraints ==== ==== Weights constraints ====
Line 130: Line 135:
 Dropout prevents overfitting. Dropout prevents overfitting.
  
-For each training example: For each node toss a coin, e.g. with prob 0.5 and eleminate nodes.+For each iteration: For each node toss a coin, e.g. with prob 0.5 and eleminate nodes.
  
 ==== Inverted dropout ==== ==== Inverted dropout ====
Line 136: Line 141:
 Layer $l=3$. Layer $l=3$.
  
-$keep.prob = 0.8$+$keep.prob = 0.8$ // probability that unit will be kept
  
 $d3 = np.random.rand(a3.shape[0], a3.shape[i]) < keep.prob$ $d3 = np.random.rand(a3.shape[0], a3.shape[i]) < keep.prob$
  
-$a3 = np.multiply(a3,d3)$+$a3 = np.multiply(a3,d3)$ // activations in layer 3 $a3 *= d3$
  
 $a3 /= keep.prob$ // e.g. 50 units => 10 units shut off $a3 /= keep.prob$ // e.g. 50 units => 10 units shut off
  • data_mining/neural_network/overfitting.txt
  • Last modified: 2018/05/10 18:03
  • by phreazer