Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Next revisionBoth sides next revision | ||
data_mining:logistic_regression [2018/05/10 17:42] – [Regularization] phreazer | data_mining:logistic_regression [2018/05/10 17:45] – [Regularization] phreazer | ||
---|---|---|---|
Line 78: | Line 78: | ||
$J(W^{[l]}, | $J(W^{[l]}, | ||
+ | |||
+ | This results in a **simpler** network / each hidden unit has **smaller effect**. | ||
+ | |||
+ | Another effect, wehn $W$ is small, $z$ has a smaller range, resulting activation e.g. for tanh is more linear. | ||
=== Gradient descent (Linear Regression) === | === Gradient descent (Linear Regression) === |