Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
data_mining:entropie [2013/09/15 17:20] – [Mutual information] phreazer | data_mining:entropie [2013/09/15 18:24] – [Mutual information] phreazer | ||
---|---|---|---|
Line 28: | Line 28: | ||
H(F) + H(B) - H(F,B) | H(F) + H(B) - H(F,B) | ||
- | Smoothing | + | Features selection => Die, die höchste MI haben, allerdings zu rechenintensiv |
+ | |||
+ | Proxies: IDF; iterativ AdaBoost | ||
+ | |||
+ | Mehr features -> | ||
+ | NBC verbessert sich, fällt dann. | ||
+ | |||
+ | Redundante Features, Annahme von Bayes | ||
+ | |||
+ | ====== Beispiel ====== | ||
+ | p(+) = 10.000/ | ||
+ | p(-) = 5.000/ | ||
+ | p(hate) = 3.000/ | ||
+ | p(~hate) = 0,8\\ | ||
+ | p(hate,+) =1/15.000 \text{(kommt in keinem positiven Kommentar vor, 1 anstelle von Null => Smoothing)}\\ | ||
+ | p(~hate,+) = 10.000/ | ||
+ | p(hate,-) = 3.000/ | ||
+ | p(~hate,-) = 2.000/ | ||
$$ | $$ | ||
- | p(+)=0,75\\ | + | I(H,S) = p(hate, |
- | p(-)=0,25\\ | + | |
- | p(hate)=800/ | + | |
- | p(~hate)=7200/ | + | |
- | p(hate,+)=1/ | + | |
- | p(~hate,+)=6000/ | + | |
- | p(hate,-)=1200/ | + | |
- | p(~hate,-)=0,1 | + | |
$$ | $$ | ||
+ | |||
+ | |||
+ | ====== Kapazität eines Kanals ====== | ||
+ | |||
+ | Maximale mutual information, | ||
+ | |||
+ | Äquivalent im ML: Wie viele Trainingsdaten notwendig -> Abhängig vom Konzept | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||