Differences

This shows you the differences between two versions of the page.

--- data_mining:neural_network:backpropagation [2017/08/19 19:48] – [Overfitting (How well does the network generalize?)] phreazer
+++ data_mining:neural_network:backpropagation [2018/05/12 12:14] (current) – [Backpropagation] phreazer
@@ Line 1: / Line 1: @@
 ====== Backpropagation ======
-We can compute how fast the error changes.
+We can compute //how fast the error changes//.
 Using error derivates w.r.t. hidden acitvities. Then convert error derivates to weights.
@@ Line 76: / Line 76: @@
 ==== Overfitting (How well does the network generalize?) ====
+ See [[data_mining:neural_network:overfitting|Overfitting &amp; Parameter tuning]]
-  * Target values unreliable?
-  * Sampling errors (accidental regularities of particular training cases)
-Regularization Methods:
-  * Weight decay (small weights, simpler model)
-  * Weight-sharing (same weights)
-  * Early-stopping (Fake testset, when performance gets worse, stop training)
-  * Model-Averaging
-  * Bayes fitting (like model averaging)
-  * Dropout (Randomly ommit hidden units)
-  * Generative pre-training)
-=== $L_1$ and $L_2$ regularization ===
-== Example for logistic regression: ==
-$L_2$ regularization:
-E.g. for Logistic regression, add to cost function $J$: $\dots + \frac{\lambda}{2m} ||w||^2_2 = \dots + \sum_{j=i}^{n_x} w_j^2 = \dots + w^T w$
-$L_1$ regularization:
-$\frac{\lambda}{2m} ||w||_1$
-$w$ will be sparse.
-Use hold-out test set to set hyperparameter.
-== Neural Network ==
-Cost function
-$J(\dots) = 1/m \sum_{i=1}^n L(\hat{y}^{i}, y^{i}) + \frac{\lambda}{2m} \sum_{l=1}^L ||W^{[l]}||_F^2$
-Frobenius Norm: $||W^{[l]}||_F^2$
-For gradient descent:
-$dW^{[l]} = \dots + \frac{\lambda}{m} W^{[l]}$
-Called **weight decay** (additional multiplication with weights)
-Large $\lambda$: Every layer ~ linear; z small range of values (in case of tanh activation fct)
 ==== History of backpropagation ====