data_mining:neural_network:backpropagation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:backpropagation [2017/08/19 21:43] – [Overfitting (How well does the network generalize?)] phreazerdata_mining:neural_network:backpropagation [2018/05/12 14:14] (current) – [Backpropagation] phreazer
Line 1: Line 1:
 ====== Backpropagation ====== ====== Backpropagation ======
  
-We can compute how fast the error changes.+We can compute //how fast the error changes//.
  
 Using error derivates w.r.t. hidden acitvities. Then convert error derivates to weights. Using error derivates w.r.t. hidden acitvities. Then convert error derivates to weights.
Line 76: Line 76:
  
 ==== Overfitting (How well does the network generalize?) ==== ==== Overfitting (How well does the network generalize?) ====
- + See [[data_mining:neural_network:overfitting|Overfitting & Parameter tuning]]
-  * Target values unreliable? +
-  * Sampling errors (accidental regularities of particular training cases) +
- +
-Regularization Methods: +
-  * Weight decay (small weights, simpler model) +
-  * Weight-sharing (same weights) +
-  * Early-stopping (Fake testset, when performance gets worse, stop training) +
-  * Model-Averaging +
-  * Bayes fitting (like model averaging) +
-  * Dropout (Randomly ommit hidden units) +
-  * Generative pre-training) +
- +
-=== $L_1$ and $L_2$ regularization === +
- +
-== Example for logistic regression== +
- +
-$L_2$ regularization: +
- +
-E.g. for Logistic regression, add to cost function $J$: $\dots + \frac{\lambda}{2m} ||w||^2_2 = \dots + \sum_{j=i}^{n_x} w_j^2 = \dots + w^T w$ +
- +
-$L_1$ regularization: +
- +
-$\frac{\lambda}{2m} ||w||_1$  +
- +
-$w$ will be sparse. +
- +
-Use hold-out test set to set hyperparameter. +
- +
-== Neural Network == +
- +
-Cost function  +
- +
-$J(\dots) = 1/m \sum_{i=1}^n L(\hat{y}^{i}, y^{i}) + \frac{\lambda}{2m} \sum_{l=1}^L ||W^{[l]}||_F^2$ +
- +
-Frobenius Norm: $||W^{[l]}||_F^2$ +
- +
-For gradient descent: +
- +
-$dW^{[l]} = \dots + \frac{\lambda}{m} W^{[l]}$ +
- +
-Called **weight decay** (additional multiplication with weights)+
 ==== History of backpropagation ==== ==== History of backpropagation ====
  
  • data_mining/neural_network/backpropagation.1503171803.txt.gz
  • Last modified: 2017/08/19 21:43
  • by phreazer