Man → Woman is like King → ?
Example: 4 dim embedding (Gender, royal, age, food):
Goal: Find word $w$, that maximaizes $sim(e_w, e_{king} - e_{man} + e_{woman})$
Cosine similarity often used as similarity function
$sim(u,v) = \frac{u^T v}{||u||_2 ||v||_2}$
Dimensions 10000 x 300
Embedding vector obtained with one-hot encoding $o_j$ : $E * o_j = e_j$
Goal: Learn embedding matrix $E$.
Embedding layer in Keras
Given 4 words in sequence, what is next word (using E as parameter).
Maximize likelihood with gradient descent.
Other context:
Can be used to learn a word embedding
Context: 4 words on left and right Or last 1 word Or nearby 1 word (“skip gram”)
Context and Target
“I want a glass of orange juice to go along with my cereal.”
Context: orange Pick target by chance within a window: juice or glass or …
Model:
Problems with softmax classification: Slow due to summing over dimension
Solution: Hierarchical softmax: Tree of classifiers $log |v|$. Common words on top, not a balanced tree.
When uniformly random: often frequent words like “the, of, a, …”
Heuristics are used for sampling
Generate data set
10000 binary classification problems
Global vectors for word representation
$x_{ij}$: Number of times i appears in context of j
Minimize $\sum_{i=1}^{10000} \sum_{j=1}^{10000} f(x_{ij}) (\Theta_i^{T} e_j + b_i - b'_j - log x_{ij})^2$
Weighting term $f(x_{ij})$: Weight for frequent, infrequent words
$e^{final}_w = \frac{e_w + \Theta_w}{2}$
Problem: Doesn't include order/sequence of words
Bias in text
Addressing bias in word embessing: