data_mining:neural_network:word_embeddings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:word_embeddings [2018/06/09 18:00] – [Word2Vec] phreazerdata_mining:neural_network:word_embeddings [2018/06/09 18:40] (current) – [Debiasing word embeddings] phreazer
Line 59: Line 59:
   * Learn Kontext c ("orange") => Target t ("juice")   * Learn Kontext c ("orange") => Target t ("juice")
   * $o_c => E => e_c => o_{softmax} => \hat{y}$   * $o_c => E => e_c => o_{softmax} => \hat{y}$
 +  * Softmax has $\Theta_t$ parameter
   * $L(\hat{y},y) = - \Sigma^{10000}_{i=1} y_i log \hat{y}_i$   * $L(\hat{y},y) = - \Sigma^{10000}_{i=1} y_i log \hat{y}_i$
   * $y$ is one hot vector (10000 dim)   * $y$ is one hot vector (10000 dim)
 +
 +Problems with softmax classification: Slow due to summing over dimension
 +
 +Solution: Hierarchical softmax: Tree of classifiers $log |v|$. Common words on top, not a balanced tree.
 +
 +=== How to sample context c? ===
 +
 +When uniformly random: often frequent words like "the, of, a, ..."
 +
 +Heuristics are used for sampling
 +
 +==== Negative Sampling ====
 +
 +Generate data set
 +  * Pick 1 positive example
 +  * Pick k negative examples
 +    * Choose random word from dicitionary which are not associated with context word: target = 0
 +    * Heuristic between uniform and observed distribution
 +
 +
 +10000 binary classification problems
 +
 +==== GloVe word vectors ====
 +
 +Global vectors for word representation
 +
 +$x_{ij}$: Number of times i appears in context of j
 +
 +Minimize $\sum_{i=1}^{10000} \sum_{j=1}^{10000} f(x_{ij}) (\Theta_i^{T} e_j + b_i - b'_j  - log x_{ij})^2$
 +
 +Weighting term $f(x_{ij})$: Weight for frequent, infrequent words
 +
 +$e^{final}_w = \frac{e_w + \Theta_w}{2}$
 +
 +===== Application =====
 +
 +==== Sentiment classification ====
 +
 +=== Simple model ===
 +
 +  * Extract embedding vector for each word
 +  * Sum or Avg those vectors
 +  * Pass to softmax to gain output (1-5 stars)
 +
 +Problem: Doesn't include order/sequence of words
 +
 +=== RNN for sentiment classification ===
 +
 +  * Extract embedding vector for each word
 +  * Feed into RNN with softmax output
 +
 +
 +===== Debiasing word embeddings =====
 +
 +Bias in text
 +
 +Addressing bias in word embessing:
 +
 +  - Identify bias direction (e.g. gender)
 +    * $e_{he} - e_{she}$, average them
 +  -  Neutralize: For every word that is not definitial (legitimate gender component), project
 +  - Equalize pairs: Only difference should be gender (e.g. grandfather vs. grandmother); equidistant
  
  
  • data_mining/neural_network/word_embeddings.1528560052.txt.gz
  • Last modified: 2018/06/09 18:00
  • by phreazer