This is an old revision of the document!
Word embeddings
Basics
Analogies
Man → Woman is like King → ?
Example: 4 dim embedding (Gender, royal, age, food):
- $e_{man} - e_{woman} \approx (-2, 0, 0, 0)^T$
- $e_{king} - e_{queen} \approx (-2, 0, 0, 0)^T$
Goal: Find word $w$, that maximaizes $sim(e_w, e_{king} - e_{man} + e_{woman})$
Cosine similarity often used as similarity function
$sim(u,v) = \frac{u^T v}{||u||_2 ||v||_2}$
Embedding matrix
Dimensions 10000 x 300
- Dictionary with 10000 entries
- 300 features?
Embedding vector obtained with one-hot encoding $o_j$ : $E * o_j = e_j$
Goal: Learn embedding matrix $E$.
Embedding layer in Keras
Algorithms
Neural language model
Given 4 words in sequence, what is next word (using E as parameter).
Maximize likelihood with gradient descent.
Other context:
Can be used to learn a word embedding
Context: 4 words on left and right Or last 1 word Or nearby 1 word (“skip gram”)