Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
data_mining:regression [2014/07/13 03:20] – [Normalengleichungen] phreazer | data_mining:regression [2014/07/13 03:37] – [Normalengleichungen] phreazer | ||
---|---|---|---|
Line 108: | Line 108: | ||
==== Normalengleichungen ==== | ==== Normalengleichungen ==== | ||
- | * Featurematrix | + | * Feature-/ |
* Vector y (Dim: m) | * Vector y (Dim: m) | ||
$\theta = (X^TX)^{-1}X^Ty$ | $\theta = (X^TX)^{-1}X^Ty$ | ||
+ | |||
+ | * Feature scaling nicht notwendig. | ||
+ | |||
+ | Was wenn $X^TX$ singulär (nicht invertierbar)? | ||
+ | |||
+ | (pinv in Octave) | ||
+ | |||
+ | **Gründe für Singularität: | ||
+ | * Redundante Features (lineare Abhängigkeit) | ||
+ | * Zu viele Features (z.B. $m <= n$) | ||
+ | * Lösung: Features weglassen oder regularisieren | ||
+ | |||
+ | **Wann was benutzten? | ||
+ | |||
+ | * m training tupel, n features | ||
+ | * GD funktioniert bei großem n (> 1000) gut, Normalengleichung muss (n x n) Matrix invertieren, | ||
+ | |||
===== Gradient Descent Improvements ===== | ===== Gradient Descent Improvements ===== | ||
==== Feature Scaling ==== | ==== Feature Scaling ==== |