Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Last revisionBoth sides next revision | ||
data_mining:logistic_regression [2018/05/10 17:45] – [Regularization] phreazer | data_mining:logistic_regression [2018/05/10 17:46] – [Regularization] phreazer | ||
---|---|---|---|
Line 74: | Line 74: | ||
$min \dots + \lambda \sum_{i=1}^n \theta_j^2$ | $min \dots + \lambda \sum_{i=1}^n \theta_j^2$ | ||
+ | === L2 Regularization === | ||
For large $\lambda$, $W^{[l]} => 0$ | For large $\lambda$, $W^{[l]} => 0$ | ||
Line 82: | Line 83: | ||
Another effect, wehn $W$ is small, $z$ has a smaller range, resulting activation e.g. for tanh is more linear. | Another effect, wehn $W$ is small, $z$ has a smaller range, resulting activation e.g. for tanh is more linear. | ||
+ | |||
+ | === Dropout === | ||
+ | |||
+ | |||
=== Gradient descent (Linear Regression) === | === Gradient descent (Linear Regression) === |