Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:neural_network:overfitting [2018/05/10 15:50] – [Weight penalites] phreazer | data_mining:neural_network:overfitting [2018/05/10 16:03] (current) – [Inverted dropout] phreazer | ||
---|---|---|---|
Line 86: | Line 86: | ||
Cost function | Cost function | ||
- | $J(\dots) = 1/m \sum_{i=1}^n L(\hat{y}^{i}, | + | $J(W^{[l]}, |
Frobenius Norm: $||W^{[l]}||_F^2$ | Frobenius Norm: $||W^{[l]}||_F^2$ | ||
Line 97: | Line 97: | ||
For large $\lambda$, $W^{[l]} => 0$ | For large $\lambda$, $W^{[l]} => 0$ | ||
- | |||
- | $J(W^{[l]}, | ||
This results in a **simpler** network / each hidden unit has **smaller effect**. | This results in a **simpler** network / each hidden unit has **smaller effect**. | ||
Line 141: | Line 139: | ||
==== Inverted dropout ==== | ==== Inverted dropout ==== | ||
- | Layer $l=3$. | + | < |
- | + | Layer l=3 | |
- | $keep.prob = 0.8$ | + | |
- | $d3 = np.random.rand(a3.shape[0], a3.shape[i]) < keep.prob$ | + | keep.prob |
- | $a3 = np.multiply(a3,d3)$ | + | d3 = np.random.rand(a3.shape[0], a3.shape[i]) < keep.prob // dropout vector |
- | $a3 /= keep.prob$ // e.g. 50 units => 10 units shut off | + | a3 = np.multiply(a3, |
- | $Z = Wa+b$ // reduced by 20% => standardize with 0.8 => expected value stays the same | + | a3 /= keep.prob |
+ | Z = Wa+b // reduced by 20% => standardize with 0.8 => expected value stays the same | ||
+ | </ | ||
Making predictions at test time: No drop out | Making predictions at test time: No drop out | ||