Differences

This shows you the differences between two versions of the page.

--- data_mining:neural_network:word_embeddings [2018/06/09 18:24] – [GloVe word vectors] phreazer
+++ data_mining:neural_network:word_embeddings [2018/06/09 18:40] (current) – [Debiasing word embeddings] phreazer
@@ Line 90: / Line 90: @@
 $x_{ij}$: Number of times i appears in context of j
-Minimize $\sum_{i=1}^{10000} \sum_{j=1}^{10000} f(x_{ij}) (\Theta_i^{T} e_j + b_i +b_j  - log x_{ij})^2$
+Minimize $\sum_{i=1}^{10000} \sum_{j=1}^{10000} f(x_{ij}) (\Theta_i^{T} e_j + b_i - b'_j  - log x_{ij})^2$
 Weighting term $f(x_{ij})$: Weight for frequent, infrequent words
@@ Line 96: / Line 96: @@
 $e^{final}_w = \frac{e_w + \Theta_w}{2}$
-==== Featurization view of word embeddings ====
+===== Application =====
+==== Sentiment classification ====
+=== Simple model ===
+  * Extract embedding vector for each word
+  * Sum or Avg those vectors
+  * Pass to softmax to gain output (1-5 stars)
+Problem: Doesn't include order/sequence of words
+=== RNN for sentiment classification ===
+  * Extract embedding vector for each word
+  * Feed into RNN with softmax output
+===== Debiasing word embeddings =====
+Bias in text
+Addressing bias in word embessing:
+  - Identify bias direction (e.g. gender)
+    * $e_{he} - e_{she}$, average them
+  -  Neutralize: For every word that is not definitial (legitimate gender component), project
+  - Equalize pairs: Only difference should be gender (e.g. grandfather vs. grandmother); equidistant