Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:naive_bayes [2013/09/15 16:07] – [Naive Bayes] phreazer | data_mining:naive_bayes [2015/08/04 15:02] (current) – [Bayes rule] phreazer | ||
---|---|---|---|
Line 16: | Line 16: | ||
====== Bayes rule ====== | ====== Bayes rule ====== | ||
- | Datensatz: | + | **Datensatz:** |
^ R ^ B | | ^ R ^ B | | ||
Line 23: | Line 23: | ||
| y | n | | | y | n | | ||
- | n: Einträge | + | * n: Einträge |
- | R=y für r Fälle | + | |
- | B=y für k Fälle | + | |
- | R=y und B=y für i Fälle | + | |
- | p(B|R) = i/r | + | * $p(B|R) = i/r$ |
- | p(R) = r/n | + | * $p(R) = r/n$ |
- | p(R und B) = i/n = (i/r) * (r/n) | + | * $p(R und B) = i/n = (i/r) * (r/n)$ |
- | p(B,R) = p(B|R) p(R) | + | * $p(B,R) = p(B|R) p(R)$ |
Bayes Rule: | Bayes Rule: | ||
$P(B,R) = P(B|R) P(R) = P(R|B) P(B)$ | $P(B,R) = P(B|R) P(R) = P(R|B) P(B)$ | ||
+ | Bayes Theorem: | ||
+ | $P(B|R) = \frac{P(B, | ||
+ | |||
+ | A-Priori-Wkeit: | ||
+ | Likelihood: $P(R|B)$ | ||
+ | A-Posterior-Wkeit: | ||
===== Beispiel ===== | ===== Beispiel ===== | ||
- | If a person has malaria, there is 90% chance that the blood test for malarial parasite comes up positive; however, 1% of the time the test gives a false positive. Also, there is a 1% chance of getting malaria in general. | + | Question 1: |
+ | If a person has malaria | ||
Unfortunately, | Unfortunately, | ||
- | P(tp | mp) = 0,9 | + | Geg.: |
- | P(tp | mn) = 0,01 | + | * $P(tp | mp) = 0,9$ |
- | P(mp) = 0,01 | + | * $P(tp | mn) = 0,01$ |
+ | * $P(mp) = 0,01$ | ||
- | Gesucht | + | Gesucht: |
$P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0, | $P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0, | ||
- | Test 2 | + | Question |
Now suppose your doctor had employed a far superior, more expensive test, one with only a .1% chance of a false positive. (Other parameters are the same - 90% chance of a true positive, 1% chance of malaria in general.) | Now suppose your doctor had employed a far superior, more expensive test, one with only a .1% chance of a false positive. (Other parameters are the same - 90% chance of a true positive, 1% chance of malaria in general.) | ||
What is the chance that you have malaria if you test positive with this improved procedure? | What is the chance that you have malaria if you test positive with this improved procedure? | ||
- | P(tp | mp) = 0,9 | + | * $P(tp | mp) = 0,9$ |
- | P(tp | mn) = 0,001 | + | * $P(tp | mn) = 0,001$ |
- | P(mp) = 0,01 | + | * $P(mp) = 0,01$ |
- | Gesucht | + | Gesucht: |
$P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0, | $P(mp|tp) = \frac{P(tp|mp) * P(mp)}{P(tp)} = \frac{0,9 * 0, | ||
Line 95: | Line 103: | ||
$$ | $$ | ||
B=y, wenn $> | B=y, wenn $> | ||
+ | |||
+ | log-likelihood => Anstelle von Multiplikation zu Addition um Rundungsfehler zu vermeiden | ||
+ | |||
+ | ====== Beispielhafte Sentiment Analysis ====== | ||
+ | p(+),p(-), | ||
+ | p(like|+), p(enjoy|+), p(hate|+), ... | ||
+ | P(hate|-), p(enjoy|-), p(lot|-), ... | ||
+ | |||
+ | Im Text: like, simple, lot | ||
+ | |||
+ | $$ | ||
+ | L = \frac{p(like|+)p(lot|+)[1-p(hate|+)][1-p(waste|+)]p(simple|+)}{p(like|-)p(lot|-)[1-p(hate|-)][1-p(waste|-)]p(simple|-)} * \frac{p(+)}{p(-)} | ||
+ | $$ | ||
+ | |||
+ |