data_mining:naive_bayes

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:naive_bayes [2013/09/15 15:33] – [Bayes rule] phreazerdata_mining:naive_bayes [2015/08/04 15:02] (current) – [Bayes rule] phreazer
Line 16: Line 16:
 ====== Bayes rule ====== ====== Bayes rule ======
  
-Datensatz:+**Datensatz:**
  
 ^ R ^ B | ^ R ^ B |
Line 23: Line 23:
 | y | n | | y | n |
  
-n: Einträge +  * n: Einträge 
-R=y für r Fälle +  R=y für r Fälle 
-B=y für k Fälle +  B=y für k Fälle 
-R=y und B=y für i Fälle+  R=y und B=y für i Fälle
  
-p(B|R) = i/r +  * $p(B|R) = i/r$ 
-p(R) = r/n +  * $p(R) = r/n$ 
-p(R und B) = i/n = (i/r) * (r/n) +  * $p(R und B) = i/n = (i/r) * (r/n)$ 
-p(B,R) = p(B|R) p(R)+  * $p(B,R) = p(B|R) p(R)$
  
 Bayes Rule: Bayes Rule:
 $P(B,R) = P(B|R) P(R) = P(R|B) P(B)$ $P(B,R) = P(B|R) P(R) = P(R|B) P(B)$
 +
 +Bayes Theorem:
 +$P(B|R) = \frac{P(B,R)}{P(R)} = \frac{P(R|B)P(B)}{P(R)}$
 +
 +A-Priori-Wkeit: $P(B)$
 +Likelihood: $P(R|B)$
 +A-Posterior-Wkeit: $P(B|R)$
 +===== Beispiel =====
 +Question 1:
 +If a person has malaria (mp), there is 90% chance that the blood test for malarial parasite comes up positive (tp); however, 1% of the time the test gives a false positive (tp and mn). Also, there is a 1% chance of getting malaria in general (mp).
 +
 +Unfortunately, you happen to test positive. What is the chance of your having malaria?
 +
 +Geg.:
 +  * $P(tp | mp) = 0,9$
 +  * $P(tp | mn) = 0,01$
 +  * $P(mp) = 0,01$
 +
 +Gesucht:
 +$P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0,01}{0,9*0,01+0,01*(1-0,01)} = 0,476$ 
 +
 +Question 2:
 +Now suppose your doctor had employed a far superior, more expensive test, one with only a .1% chance of a false positive. (Other parameters are the same - 90% chance of a true positive, 1% chance of malaria in general.)
 +
 +What is the chance that you have malaria if you test positive with this improved procedure?
 +
 +  * $P(tp | mp) = 0,9$
 +  * $P(tp | mn) = 0,001$
 +  * $P(mp) = 0,01$
 +
 +Gesucht:
 +$P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0,01}{0,9*0,01+0,001*(1-0,01)} = 0,901$ 
 +
 +====== Unabhängigkeit ======
 +
 +Angenommen B und R sind voneinander unabhängig.
 +
 +P(R) = r/n, P(C)= c/n
 +P(R|C) = i/c, P(C|R) = i/r
 +
 +R und B sind voneinander unabhängig wenn und nur wenn
 +i/c = r/n; i/r = c/n
 +
 +P(R|B) = P(R); P(B|R) = P(B)
 +
 +====== Naive Bayes ======
 +
 +Naiv wegen Annahme: R und C sind unabhängig gegeben B
 +
 +$$
 +P(B|R,C) * P(R,C) = P(R,C|B) * P(B)\\
 += P(R|C,B) * P(C|B) * P(B)    (Bayes Rule)\\
 += P(R|C) * P(C|B) * P(B)      (Unabhängigkeit)
 +$$
 +
 +Verhältnis berechnen:
 +$$
 +\frac{p(r|B=y) * p(c|B=y) * p(B=y)}{p(r|B=n) * p(c|B=n) * p(B=n)}
 +$$
 +
 +B=y, wenn $>\alpha$ (z.B. 1), sonst B=n
 +
 +====== Naive Bayes für N features ======
 +
 +$$
 +L = \prod_{i=1}^N \frac{p(x_i|B=y)}{p(x_i|B=n)} * \frac{p(B=y)}{p(B=n)}
 +$$
 +B=y, wenn $>\alpha$ (z.B. 1), sonst B=n
 +
 +log-likelihood => Anstelle von Multiplikation zu Addition um Rundungsfehler zu vermeiden
 +
 +====== Beispielhafte Sentiment Analysis ======
 +p(+),p(-),
 +p(like|+), p(enjoy|+), p(hate|+), ...
 +P(hate|-), p(enjoy|-), p(lot|-), ...
 +
 +Im Text: like, simple, lot
 +
 +$$
 +L = \frac{p(like|+)p(lot|+)[1-p(hate|+)][1-p(waste|+)]p(simple|+)}{p(like|-)p(lot|-)[1-p(hate|-)][1-p(waste|-)]p(simple|-)} * \frac{p(+)}{p(-)}
 +$$
 +
  
  • data_mining/naive_bayes.1379252032.txt.gz
  • Last modified: 2014/02/11 21:47
  • (external edit)