Differences

This shows you the differences between two versions of the page.

--- data_mining:naive_bayes [2013/09/15 16:07] – [Naive Bayes] phreazer
+++ data_mining:naive_bayes [2015/08/04 15:02] (current) – [Bayes rule] phreazer
@@ Line 16: / Line 16: @@
 ====== Bayes rule ======
-Datensatz:
+**Datensatz:**
 ^ R ^ B |
@@ Line 23: / Line 23: @@
 | y | n |
-n: Einträge
+  * n: Einträge
-R=y für r Fälle
+  * R=y für r Fälle
-B=y für k Fälle
+  * B=y für k Fälle
-R=y und B=y für i Fälle
+  * R=y und B=y für i Fälle
-p(B|R) = i/r
+  * $p(B|R) = i/r$
-p(R) = r/n
+  * $p(R) = r/n$
-p(R und B) = i/n = (i/r) * (r/n)
+  * $p(R und B) = i/n = (i/r) * (r/n)$
-p(B,R) = p(B|R) p(R)
+  * $p(B,R) = p(B|R) p(R)$
 Bayes Rule:
 $P(B,R) = P(B|R) P(R) = P(R|B) P(B)$
+Bayes Theorem:
+$P(B|R) = \frac{P(B,R)}{P(R)} = \frac{P(R|B)P(B)}{P(R)}$
+A-Priori-Wkeit: $P(B)$
+Likelihood: $P(R|B)$
+A-Posterior-Wkeit: $P(B|R)$
 ===== Beispiel =====
-If a person has malaria, there is 90% chance that the blood test for malarial parasite comes up positive; however, 1% of the time the test gives a false positive. Also, there is a 1% chance of getting malaria in general.
+Question 1:
+If a person has malaria (mp), there is 90% chance that the blood test for malarial parasite comes up positive (tp); however, 1% of the time the test gives a false positive (tp and mn). Also, there is a 1% chance of getting malaria in general (mp).
 Unfortunately, you happen to test positive. What is the chance of your having malaria?
-P(tp | mp) = 0,9
+Geg.:
-P(tp | mn) = 0,01
+  * $P(tp | mp) = 0,9$
-P(mp) = 0,01
+  * $P(tp | mn) = 0,01$
+  * $P(mp) = 0,01$
-Gesucht
+Gesucht:
 $P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0,01}{0,9*0,01+0,01*(1-0,01)} = 0,476$
-Test 2
+Question 2:
 Now suppose your doctor had employed a far superior, more expensive test, one with only a .1% chance of a false positive. (Other parameters are the same - 90% chance of a true positive, 1% chance of malaria in general.)
 What is the chance that you have malaria if you test positive with this improved procedure?
-P(tp | mp) = 0,9
+  * $P(tp | mp) = 0,9$
-P(tp | mn) = 0,001
+  * $P(tp | mn) = 0,001$
-P(mp) = 0,01
+  * $P(mp) = 0,01$
-Gesucht
+Gesucht:
 $P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0,01}{0,9*0,01+0,001*(1-0,01)} = 0,901$
@@ Line 95: / Line 103: @@
 $$
 B=y, wenn $>\alpha$ (z.B. 1), sonst B=n
+log-likelihood => Anstelle von Multiplikation zu Addition um Rundungsfehler zu vermeiden
+====== Beispielhafte Sentiment Analysis ======
+p(+),p(-),
+p(like|+), p(enjoy|+), p(hate|+), ...
+P(hate|-), p(enjoy|-), p(lot|-), ...
+Im Text: like, simple, lot
+$$
+L = \frac{p(like|+)p(lot|+)[1-p(hate|+)][1-p(waste|+)]p(simple|+)}{p(like|-)p(lot|-)[1-p(hate|-)][1-p(waste|-)]p(simple|-)} * \frac{p(+)}{p(-)}
+$$