Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
data_mining:neural_network:backpropagation [2017/08/19 19:12] – [Overfitting (How well does the network generalize?)] phreazer | data_mining:neural_network:backpropagation [2017/08/19 22:27] – [Overfitting (How well does the network generalize?)] phreazer | ||
---|---|---|---|
Line 76: | Line 76: | ||
==== Overfitting (How well does the network generalize? | ==== Overfitting (How well does the network generalize? | ||
- | + | See [[data_mining:neural_network:overfitting|Overfitting & Parameter tuning]] | |
- | * Target values unreliable? | + | |
- | * Sampling errors (accidental regularities of particular training cases) | + | |
- | + | ||
- | Regularization Methods: | + | |
- | * Weight decay (small weights, simpler model) | + | |
- | * Weight-sharing (same weights) | + | |
- | * Early-stopping (Fake testset, when performance gets worse, stop training) | + | |
- | * Model-Averaging | + | |
- | * Bayes fitting (like model averaging) | + | |
- | * Dropout (Randomly ommit hidden units) | + | |
- | * Generative pre-training) | + | |
- | + | ||
- | === Weight decay === | + | |
- | + | ||
- | $L_2$ regularization: | + | |
- | + | ||
- | E.g. for Logistic regression, add to cost function $J$: $+ \frac{\lambda}{2m} ||w||^2_2$ | + | |
==== History of backpropagation ==== | ==== History of backpropagation ==== | ||