data_mining:neural_network:word_embeddings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:word_embeddings [2018/06/09 18:24] – [GloVe word vectors] phreazerdata_mining:neural_network:word_embeddings [2018/06/09 18:40] (current) – [Debiasing word embeddings] phreazer
Line 90: Line 90:
 $x_{ij}$: Number of times i appears in context of j $x_{ij}$: Number of times i appears in context of j
  
-Minimize $\sum_{i=1}^{10000} \sum_{j=1}^{10000} f(x_{ij}) (\Theta_i^{T} e_j + b_i +b_j  - log x_{ij})^2$+Minimize $\sum_{i=1}^{10000} \sum_{j=1}^{10000} f(x_{ij}) (\Theta_i^{T} e_j + b_i - b'_j  - log x_{ij})^2$
  
 Weighting term $f(x_{ij})$: Weight for frequent, infrequent words Weighting term $f(x_{ij})$: Weight for frequent, infrequent words
Line 96: Line 96:
 $e^{final}_w = \frac{e_w + \Theta_w}{2}$ $e^{final}_w = \frac{e_w + \Theta_w}{2}$
  
-==== Featurization view of word embeddings ====+===== Application ===== 
 + 
 +==== Sentiment classification ==== 
 + 
 +=== Simple model === 
 + 
 +  * Extract embedding vector for each word 
 +  * Sum or Avg those vectors 
 +  * Pass to softmax to gain output (1-5 stars) 
 + 
 +Problem: Doesn't include order/sequence of words 
 + 
 +=== RNN for sentiment classification === 
 + 
 +  * Extract embedding vector for each word 
 +  * Feed into RNN with softmax output 
 + 
 + 
 +===== Debiasing word embeddings ====
 + 
 +Bias in text 
 + 
 +Addressing bias in word embessing: 
 + 
 +  - Identify bias direction (e.g. gender) 
 +    * $e_{he} - e_{she}$, average them 
 +  -  Neutralize: For every word that is not definitial (legitimate gender component), project 
 +  - Equalize pairs: Only difference should be gender (e.g. grandfather vs. grandmother); equidistant 
  
  • data_mining/neural_network/word_embeddings.1528561474.txt.gz
  • Last modified: 2018/06/09 18:24
  • by phreazer