data_mining:logistic_regression

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
data_mining:logistic_regression [2018/05/10 17:42] – [Regularization] phreazerdata_mining:logistic_regression [2018/05/10 17:45] – [Regularization] phreazer
Line 78: Line 78:
  
 $J(W^{[l]},b^{[l]})= \frac{1}{m} \sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||^2$ $J(W^{[l]},b^{[l]})= \frac{1}{m} \sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||^2$
 +
 +This results in a **simpler** network / each hidden unit has **smaller effect**.
 +
 +Another effect, wehn $W$ is small, $z$ has a smaller range, resulting activation e.g. for tanh is more linear.
  
 === Gradient descent (Linear Regression) === === Gradient descent (Linear Regression) ===
  • data_mining/logistic_regression.txt
  • Last modified: 2018/05/10 17:48
  • by phreazer