Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Next revisionBoth sides next revision | ||
data_mining:neural_network:overfitting [2018/05/10 17:50] – [Weight penalites] phreazer | data_mining:neural_network:overfitting [2018/05/10 17:52] – [Weight penalites] phreazer | ||
---|---|---|---|
Line 86: | Line 86: | ||
Cost function | Cost function | ||
- | $J(\dots) = 1/m \sum_{i=1}^n L(\hat{y}^{i}, | + | $J(W^{[l]}, |
Frobenius Norm: $||W^{[l]}||_F^2$ | Frobenius Norm: $||W^{[l]}||_F^2$ | ||
Line 97: | Line 97: | ||
For large $\lambda$, $W^{[l]} => 0$ | For large $\lambda$, $W^{[l]} => 0$ | ||
- | |||
- | $J(W^{[l]}, | ||
This results in a **simpler** network / each hidden unit has **smaller effect**. | This results in a **simpler** network / each hidden unit has **smaller effect**. |