data_mining:naive_bayes

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:naive_bayes [2013/09/15 15:49] – [Beispiel] phreazerdata_mining:naive_bayes [2015/08/04 15:02] (current) – [Bayes rule] phreazer
Line 16: Line 16:
 ====== Bayes rule ====== ====== Bayes rule ======
  
-Datensatz:+**Datensatz:**
  
 ^ R ^ B | ^ R ^ B |
Line 23: Line 23:
 | y | n | | y | n |
  
-n: Einträge +  * n: Einträge 
-R=y für r Fälle +  R=y für r Fälle 
-B=y für k Fälle +  B=y für k Fälle 
-R=y und B=y für i Fälle+  R=y und B=y für i Fälle
  
-p(B|R) = i/r +  * $p(B|R) = i/r$ 
-p(R) = r/n +  * $p(R) = r/n$ 
-p(R und B) = i/n = (i/r) * (r/n) +  * $p(R und B) = i/n = (i/r) * (r/n)$ 
-p(B,R) = p(B|R) p(R)+  * $p(B,R) = p(B|R) p(R)$
  
 Bayes Rule: Bayes Rule:
 $P(B,R) = P(B|R) P(R) = P(R|B) P(B)$ $P(B,R) = P(B|R) P(R) = P(R|B) P(B)$
  
 +Bayes Theorem:
 +$P(B|R) = \frac{P(B,R)}{P(R)} = \frac{P(R|B)P(B)}{P(R)}$
 +
 +A-Priori-Wkeit: $P(B)$
 +Likelihood: $P(R|B)$
 +A-Posterior-Wkeit: $P(B|R)$
 ===== Beispiel ===== ===== Beispiel =====
-If a person has malaria, there is 90% chance that the blood test for malarial parasite comes up positive; however, 1% of the time the test gives a false positive. Also, there is a 1% chance of getting malaria in general.+Question 1: 
 +If a person has malaria (mp), there is 90% chance that the blood test for malarial parasite comes up positive (tp); however, 1% of the time the test gives a false positive (tp and mn). Also, there is a 1% chance of getting malaria in general (mp).
  
 Unfortunately, you happen to test positive. What is the chance of your having malaria? Unfortunately, you happen to test positive. What is the chance of your having malaria?
  
-P(tp | mp) = 0,9 +Geg.: 
-P(tp | mn) = 0,01 +  * $P(tp | mp) = 0,9$ 
-P(mp) = 0,01+  * $P(tp | mn) = 0,01$ 
 +  * $P(mp) = 0,01$
  
-Gesucht+Gesucht:
 $P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0,01}{0,9*0,01+0,01*(1-0,01)} = 0,476$  $P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0,01}{0,9*0,01+0,01*(1-0,01)} = 0,476$ 
  
-Test 2+Question 2:
 Now suppose your doctor had employed a far superior, more expensive test, one with only a .1% chance of a false positive. (Other parameters are the same - 90% chance of a true positive, 1% chance of malaria in general.) Now suppose your doctor had employed a far superior, more expensive test, one with only a .1% chance of a false positive. (Other parameters are the same - 90% chance of a true positive, 1% chance of malaria in general.)
  
 What is the chance that you have malaria if you test positive with this improved procedure? What is the chance that you have malaria if you test positive with this improved procedure?
  
-P(tp | mp) = 0,9 +  * $P(tp | mp) = 0,9$ 
-P(tp | mn) = 0,001 +  * $P(tp | mn) = 0,001$ 
-P(mp) = 0,01+  * $P(mp) = 0,01$
  
-Gesucht+Gesucht:
 $P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0,01}{0,9*0,01+0,001*(1-0,01)} = 0,901$  $P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0,01}{0,9*0,01+0,001*(1-0,01)} = 0,901$ 
 +
 +====== Unabhängigkeit ======
 +
 +Angenommen B und R sind voneinander unabhängig.
 +
 +P(R) = r/n, P(C)= c/n
 +P(R|C) = i/c, P(C|R) = i/r
 +
 +R und B sind voneinander unabhängig wenn und nur wenn
 +i/c = r/n; i/r = c/n
 +
 +P(R|B) = P(R); P(B|R) = P(B)
 +
 +====== Naive Bayes ======
 +
 +Naiv wegen Annahme: R und C sind unabhängig gegeben B
 +
 +$$
 +P(B|R,C) * P(R,C) = P(R,C|B) * P(B)\\
 += P(R|C,B) * P(C|B) * P(B)    (Bayes Rule)\\
 += P(R|C) * P(C|B) * P(B)      (Unabhängigkeit)
 +$$
 +
 +Verhältnis berechnen:
 +$$
 +\frac{p(r|B=y) * p(c|B=y) * p(B=y)}{p(r|B=n) * p(c|B=n) * p(B=n)}
 +$$
 +
 +B=y, wenn $>\alpha$ (z.B. 1), sonst B=n
 +
 +====== Naive Bayes für N features ======
 +
 +$$
 +L = \prod_{i=1}^N \frac{p(x_i|B=y)}{p(x_i|B=n)} * \frac{p(B=y)}{p(B=n)}
 +$$
 +B=y, wenn $>\alpha$ (z.B. 1), sonst B=n
 +
 +log-likelihood => Anstelle von Multiplikation zu Addition um Rundungsfehler zu vermeiden
 +
 +====== Beispielhafte Sentiment Analysis ======
 +p(+),p(-),
 +p(like|+), p(enjoy|+), p(hate|+), ...
 +P(hate|-), p(enjoy|-), p(lot|-), ...
 +
 +Im Text: like, simple, lot
 +
 +$$
 +L = \frac{p(like|+)p(lot|+)[1-p(hate|+)][1-p(waste|+)]p(simple|+)}{p(like|-)p(lot|-)[1-p(hate|-)][1-p(waste|-)]p(simple|-)} * \frac{p(+)}{p(-)}
 +$$
 +
 +
  • data_mining/naive_bayes.1379252954.txt.gz
  • Last modified: 2014/02/11 21:47
  • (external edit)