Differences

This shows you the differences between two versions of the page.

--- data_mining:logistic_regression [2014/07/20 17:26] – [Regularization] phreazer
+++ data_mining:logistic_regression [2018/05/10 17:46] – [Regularization] phreazer
@@ Line 56: / Line 56: @@
 Z.B. aus 3-Klassenproblem 3 binäre Probleme erzeugen. $h_\theta(x)^{(i)} = P(y=i|x;\theta); i=1,2,3$
-Dann wähle Klasse i, die $max_i h_\theta^{(i)}(x)$
+Dann wähle Klasse i, die $\max_i h_\theta^{(i)}(x)$
 ===== Adressing Overfitting =====
@@ Line 73: / Line 73: @@
 $min \dots + \lambda \sum_{i=1}^n \theta_j^2$
+=== L2 Regularization ===
+For large $\lambda$, $W^{[l]} => 0$
+$J(W^{[l]},b^{[l]})= \frac{1}{m} \sum_{i=1}^m J(\hat{y}^{(i)}, y^{(i)}) + \frac{\lambda}{2m} \sum_{l=1}^L || W^{[l]} ||^2$
+This results in a **simpler** network / each hidden unit has **smaller effect**.
+Another effect, wehn $W$ is small, $z$ has a smaller range, resulting activation e.g. for tanh is more linear.
+=== Dropout ===
 === Gradient descent (Linear Regression) ===