Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
data_mining:neural_network:overfitting [2018/05/10 17:50] – [Weight penalites] phreazer | data_mining:neural_network:overfitting [2018/05/10 17:55] – [Inverted dropout] phreazer | ||
---|---|---|---|
Line 86: | Line 86: | ||
Cost function | Cost function | ||
- | $J(\dots) = 1/m \sum_{i=1}^n L(\hat{y}^{i}, | + | $J(W^{[l]}, |
Frobenius Norm: $||W^{[l]}||_F^2$ | Frobenius Norm: $||W^{[l]}||_F^2$ | ||
Line 97: | Line 97: | ||
For large $\lambda$, $W^{[l]} => 0$ | For large $\lambda$, $W^{[l]} => 0$ | ||
- | |||
- | $J(W^{[l]}, | ||
This results in a **simpler** network / each hidden unit has **smaller effect**. | This results in a **simpler** network / each hidden unit has **smaller effect**. | ||
Line 143: | Line 141: | ||
Layer $l=3$. | Layer $l=3$. | ||
- | $keep.prob = 0.8$ | + | $keep.prob = 0.8$ // probability that unit will be kept |
$d3 = np.random.rand(a3.shape[0], | $d3 = np.random.rand(a3.shape[0], | ||
- | $a3 = np.multiply(a3, | + | $a3 = np.multiply(a3, |
$a3 /= keep.prob$ // e.g. 50 units => 10 units shut off | $a3 /= keep.prob$ // e.g. 50 units => 10 units shut off |