Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:naive_bayes [2013/09/15 15:54] – phreazer | data_mining:naive_bayes [2015/08/04 15:02] (current) – [Bayes rule] phreazer | ||
---|---|---|---|
Line 16: | Line 16: | ||
====== Bayes rule ====== | ====== Bayes rule ====== | ||
- | Datensatz: | + | **Datensatz:** |
^ R ^ B | | ^ R ^ B | | ||
Line 23: | Line 23: | ||
| y | n | | | y | n | | ||
- | n: Einträge | + | * n: Einträge |
- | R=y für r Fälle | + | |
- | B=y für k Fälle | + | |
- | R=y und B=y für i Fälle | + | |
- | p(B|R) = i/r | + | * $p(B|R) = i/r$ |
- | p(R) = r/n | + | * $p(R) = r/n$ |
- | p(R und B) = i/n = (i/r) * (r/n) | + | * $p(R und B) = i/n = (i/r) * (r/n)$ |
- | p(B,R) = p(B|R) p(R) | + | * $p(B,R) = p(B|R) p(R)$ |
Bayes Rule: | Bayes Rule: | ||
$P(B,R) = P(B|R) P(R) = P(R|B) P(B)$ | $P(B,R) = P(B|R) P(R) = P(R|B) P(B)$ | ||
+ | Bayes Theorem: | ||
+ | $P(B|R) = \frac{P(B, | ||
+ | |||
+ | A-Priori-Wkeit: | ||
+ | Likelihood: $P(R|B)$ | ||
+ | A-Posterior-Wkeit: | ||
===== Beispiel ===== | ===== Beispiel ===== | ||
- | If a person has malaria, there is 90% chance that the blood test for malarial parasite comes up positive; however, 1% of the time the test gives a false positive. Also, there is a 1% chance of getting malaria in general. | + | Question 1: |
+ | If a person has malaria | ||
Unfortunately, | Unfortunately, | ||
- | P(tp | mp) = 0,9 | + | Geg.: |
- | P(tp | mn) = 0,01 | + | * $P(tp | mp) = 0,9$ |
- | P(mp) = 0,01 | + | * $P(tp | mn) = 0,01$ |
+ | * $P(mp) = 0,01$ | ||
- | Gesucht | + | Gesucht: |
$P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0, | $P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0, | ||
- | Test 2 | + | Question |
Now suppose your doctor had employed a far superior, more expensive test, one with only a .1% chance of a false positive. (Other parameters are the same - 90% chance of a true positive, 1% chance of malaria in general.) | Now suppose your doctor had employed a far superior, more expensive test, one with only a .1% chance of a false positive. (Other parameters are the same - 90% chance of a true positive, 1% chance of malaria in general.) | ||
What is the chance that you have malaria if you test positive with this improved procedure? | What is the chance that you have malaria if you test positive with this improved procedure? | ||
- | P(tp | mp) = 0,9 | + | * $P(tp | mp) = 0,9$ |
- | P(tp | mn) = 0,001 | + | * $P(tp | mn) = 0,001$ |
- | P(mp) = 0,01 | + | * $P(mp) = 0,01$ |
- | Gesucht | + | Gesucht: |
$P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0, | $P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0, | ||
Line 71: | Line 79: | ||
P(R|B) = P(R); P(B|R) = P(B) | P(R|B) = P(R); P(B|R) = P(B) | ||
+ | |||
+ | ====== Naive Bayes ====== | ||
+ | |||
+ | Naiv wegen Annahme: R und C sind unabhängig gegeben B | ||
+ | |||
+ | $$ | ||
+ | P(B|R,C) * P(R,C) = P(R,C|B) * P(B)\\ | ||
+ | = P(R|C,B) * P(C|B) * P(B) (Bayes Rule)\\ | ||
+ | = P(R|C) * P(C|B) * P(B) (Unabhängigkeit) | ||
+ | $$ | ||
+ | |||
+ | Verhältnis berechnen: | ||
+ | $$ | ||
+ | \frac{p(r|B=y) * p(c|B=y) * p(B=y)}{p(r|B=n) * p(c|B=n) * p(B=n)} | ||
+ | $$ | ||
+ | |||
+ | B=y, wenn $> | ||
+ | |||
+ | ====== Naive Bayes für N features ====== | ||
+ | |||
+ | $$ | ||
+ | L = \prod_{i=1}^N \frac{p(x_i|B=y)}{p(x_i|B=n)} * \frac{p(B=y)}{p(B=n)} | ||
+ | $$ | ||
+ | B=y, wenn $> | ||
+ | |||
+ | log-likelihood => Anstelle von Multiplikation zu Addition um Rundungsfehler zu vermeiden | ||
+ | |||
+ | ====== Beispielhafte Sentiment Analysis ====== | ||
+ | p(+),p(-), | ||
+ | p(like|+), p(enjoy|+), p(hate|+), ... | ||
+ | P(hate|-), p(enjoy|-), p(lot|-), ... | ||
+ | |||
+ | Im Text: like, simple, lot | ||
+ | |||
+ | $$ | ||
+ | L = \frac{p(like|+)p(lot|+)[1-p(hate|+)][1-p(waste|+)]p(simple|+)}{p(like|-)p(lot|-)[1-p(hate|-)][1-p(waste|-)]p(simple|-)} * \frac{p(+)}{p(-)} | ||
+ | $$ | ||
+ | |||