Differences
This shows you the differences between two versions of the page.
Next revision | Previous revisionNext revisionBoth sides next revision | ||
data_mining:tf-idf [2013/09/15 15:01] – angelegt phreazer | data_mining:tf-idf [2013/09/15 15:05] – [TF-IDF] phreazer | ||
---|---|---|---|
Line 4: | Line 4: | ||
===== TF-IDF ===== | ===== TF-IDF ===== | ||
+ | |||
+ | Heuristik | ||
+ | |||
+ | Web-Page conetent => TF-IDF => Web-Page keywords | ||
Wort wie " | Wort wie " | ||
Seltene Worte ergeben bessere Keywords | Seltene Worte ergeben bessere Keywords | ||
- | IDF = Inverse document frequency of word $w = log_2\frac{N}{N_w} | + | IDF = Inverse document frequency of word $w = log_2\frac{N}{N_w}$ |
N: Gesamtzahl Dokumente | N: Gesamtzahl Dokumente | ||
N_w Dokumente die w enthalten | N_w Dokumente die w enthalten | ||
Line 16: | Line 20: | ||
Häufigere Wörter ergeben bessere Keywords | Häufigere Wörter ergeben bessere Keywords | ||
- | $n_w^d = Häufigkeit von w in document d | + | $n_w^d$ = Häufigkeit von w in document d |
+ | |||
+ | TF-IDS = term-frequency x IDF = $n^d_w log_2 \frac{N}{N_w}$ | ||
- | TF-IDS = term-frequency x IDF = n^d_w log_2 \frac{N}{N_w} | + | Mutual information zwischen allen Seiten und allen Wörtern ist proportional zu $\sum_d \sum_w |