Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:logistic_regression [2018/05/10 15:45] – [Regularization] phreazer | data_mining:logistic_regression [2018/05/10 15:48] (current) – phreazer | ||
---|---|---|---|
Line 73: | Line 73: | ||
$min \dots + \lambda \sum_{i=1}^n \theta_j^2$ | $min \dots + \lambda \sum_{i=1}^n \theta_j^2$ | ||
- | |||
- | |||
- | For large $\lambda$, $W^{[l]} => 0$ | ||
- | |||
- | $J(W^{[l]}, | ||
- | |||
- | This results in a **simpler** network / each hidden unit has **smaller effect**. | ||
- | |||
- | Another effect, wehn $W$ is small, $z$ has a smaller range, resulting activation e.g. for tanh is more linear. | ||
=== Gradient descent (Linear Regression) === | === Gradient descent (Linear Regression) === |