Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
data_mining:logistic_regression [2018/05/10 17:46]
phreazer [Regularization]
data_mining:logistic_regression [2018/05/10 17:48] (current)
phreazer
Line 73: Line 73:
  
 $min \dots + \lambda \sum_{i=1}^n \theta_j^2$ $min \dots + \lambda \sum_{i=1}^n \theta_j^2$
- 
-=== L2 Regularization === 
- 
-For large $\lambda$, $W^{[l]} => 0$ 
- 
-$J(W^{[l]},​b^{[l]})= \frac{1}{m} \sum_{i=1}^m J(\hat{y}^{(i)},​ y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||^2$ 
- 
-This results in a **simpler** network / each hidden unit has **smaller effect**. 
- 
-Another effect, wehn $W$ is small, $z$ has a smaller range, resulting activation e.g. for tanh is more linear. 
- 
-=== Dropout === 
- 
- 
  
 === Gradient descent (Linear Regression) === === Gradient descent (Linear Regression) ===