Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:neural_network:word_embeddings [2018/06/09 15:59] – [Word2Vec] phreazer | data_mining:neural_network:word_embeddings [2018/06/09 16:40] (current) – [Debiasing word embeddings] phreazer | ||
---|---|---|---|
Line 59: | Line 59: | ||
* Learn Kontext c (" | * Learn Kontext c (" | ||
* $o_c => E => e_c => o_{softmax} => \hat{y}$ | * $o_c => E => e_c => o_{softmax} => \hat{y}$ | ||
+ | * Softmax has $\Theta_t$ parameter | ||
+ | * $L(\hat{y}, | ||
+ | * $y$ is one hot vector (10000 dim) | ||
+ | |||
+ | Problems with softmax classification: | ||
+ | |||
+ | Solution: Hierarchical softmax: Tree of classifiers $log |v|$. Common words on top, not a balanced tree. | ||
+ | |||
+ | === How to sample context c? === | ||
+ | |||
+ | When uniformly random: often frequent words like "the, of, a, ..." | ||
+ | |||
+ | Heuristics are used for sampling | ||
+ | |||
+ | ==== Negative Sampling ==== | ||
+ | |||
+ | Generate data set | ||
+ | * Pick 1 positive example | ||
+ | * Pick k negative examples | ||
+ | * Choose random word from dicitionary which are not associated with context word: target = 0 | ||
+ | * Heuristic between uniform and observed distribution | ||
+ | |||
+ | |||
+ | 10000 binary classification problems | ||
+ | |||
+ | ==== GloVe word vectors ==== | ||
+ | |||
+ | Global vectors for word representation | ||
+ | |||
+ | $x_{ij}$: Number of times i appears in context of j | ||
+ | |||
+ | Minimize $\sum_{i=1}^{10000} \sum_{j=1}^{10000} f(x_{ij}) (\Theta_i^{T} e_j + b_i - b' | ||
+ | |||
+ | Weighting term $f(x_{ij})$: | ||
+ | |||
+ | $e^{final}_w = \frac{e_w + \Theta_w}{2}$ | ||
+ | |||
+ | ===== Application ===== | ||
+ | |||
+ | ==== Sentiment classification ==== | ||
+ | |||
+ | === Simple model === | ||
+ | |||
+ | * Extract embedding vector for each word | ||
+ | * Sum or Avg those vectors | ||
+ | * Pass to softmax to gain output (1-5 stars) | ||
+ | |||
+ | Problem: Doesn' | ||
+ | |||
+ | === RNN for sentiment classification === | ||
+ | |||
+ | * Extract embedding vector for each word | ||
+ | * Feed into RNN with softmax output | ||
+ | |||
+ | |||
+ | ===== Debiasing word embeddings ===== | ||
+ | |||
+ | Bias in text | ||
+ | |||
+ | Addressing bias in word embessing: | ||
+ | |||
+ | - Identify bias direction (e.g. gender) | ||
+ | * $e_{he} - e_{she}$, average them | ||
+ | - Neutralize: For every word that is not definitial (legitimate gender component), project | ||
+ | - Equalize pairs: Only difference should be gender (e.g. grandfather vs. grandmother); | ||