data_mining:logistic_regression

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
data_mining:logistic_regression [2014/07/20 17:26] – [Regularization] phreazerdata_mining:logistic_regression [2018/05/10 17:46] – [Regularization] phreazer
Line 56: Line 56:
 Z.B. aus 3-Klassenproblem 3 binäre Probleme erzeugen. $h_\theta(x)^{(i)} = P(y=i|x;\theta); i=1,2,3$ Z.B. aus 3-Klassenproblem 3 binäre Probleme erzeugen. $h_\theta(x)^{(i)} = P(y=i|x;\theta); i=1,2,3$
  
-Dann wähle Klasse i, die $max_i h_\theta^{(i)}(x)$+Dann wähle Klasse i, die $\max_i h_\theta^{(i)}(x)$
  
 ===== Adressing Overfitting ===== ===== Adressing Overfitting =====
Line 73: Line 73:
  
 $min \dots + \lambda \sum_{i=1}^n \theta_j^2$ $min \dots + \lambda \sum_{i=1}^n \theta_j^2$
 +
 +=== L2 Regularization ===
 +
 +For large $\lambda$, $W^{[l]} => 0$
 +
 +$J(W^{[l]},b^{[l]})= \frac{1}{m} \sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||^2$
 +
 +This results in a **simpler** network / each hidden unit has **smaller effect**.
 +
 +Another effect, wehn $W$ is small, $z$ has a smaller range, resulting activation e.g. for tanh is more linear.
 +
 +=== Dropout ===
 +
 +
  
 === Gradient descent (Linear Regression) === === Gradient descent (Linear Regression) ===
  • data_mining/logistic_regression.txt
  • Last modified: 2018/05/10 17:48
  • by phreazer