data_mining:neural_network:loss_functions

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
data_mining:neural_network:loss_functions [2019/11/09 23:30] – created phreazerdata_mining:neural_network:loss_functions [2019/11/10 00:40] (current) – [Binary cross entropy] phreazer
Line 6: Line 6:
  
 $-(y log(\hat{y}) + (1-y) log(1-\hat{y})$ $-(y log(\hat{y}) + (1-y) log(1-\hat{y})$
 +
 +or
 +
 +$-1/N \sum_{i=1}^N (y_i log(\hat{y_i}) + (1-y_i) log(1-\hat{y_i})$
  
 (Negative log, since log in [0,1] is < 0) (Negative log, since log in [0,1] is < 0)
Line 14: Line 18:
  
 For other distrubition (and in general) with C classes, entropy of distribution is $H(q) = - \sum_{c=1}^C q(y_c) * log(q(y_c))$ For other distrubition (and in general) with C classes, entropy of distribution is $H(q) = - \sum_{c=1}^C q(y_c) * log(q(y_c))$
 +
 +Small value of $q(y_c)$ leads to large negative log (multiple times of q): math.log(0.01) = -4.6
 +
 +Large value of $q(y_c)$ leads to small negative log: math.log(0.99) = -0.01
 +
 +More classes, higher entropy
 +
 +Recap cross-entropy:
 +
 +$H_p(q) = - \sum_{c=1}^C q(y_c) * log(p(y_c))$ where p is other distribution
 +
 +When $p == q$, then $H_p(q) == H(q)$, so cross-entropy >= entropy.
 +
 +Recap KL-Divergence:
 +
 +Is difference between cross-entropy and entropy.
 +
 +$D_{KL}(q||p) = H_p(q) - H(q) = \sum_{c=1}^C q(y_c) (log(q(y_c) - log(p(y_c)))$
 +
 +For an algo we need to find closes $p$ for $q$ by minimizing cross-entropy.
 +
 +In training we have $N$ samples. For one particular example the distribution is known to which class it belongs. The loss function should minimize the average cross-entropy.
 +
 +Outcome: Scalar [0,1] using sigmoid
 +
 +===== Cross-entropy =====
 +
 +$-\sum_{i}^C(y_i log(\hat{y_i})$
 +$C$ is number of classes
 +
 +Outcome: Vector [0,1] using softmax
 +
 +===== Binary cross entropy with multi labels =====
 +
 +$-\sum_{i}^C(y_i log(\hat{y_i}) + (1-y_i) log(1-\hat{y_i})$
 +
 +Ouctome: Vector [0,1] using sigmoid
 +
  • data_mining/neural_network/loss_functions.1573338637.txt.gz
  • Last modified: 2019/11/09 23:30
  • by phreazer