Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:neural_network:loss_functions [2019/11/09 22:58] – [Binary cross entropy] phreazer | data_mining:neural_network:loss_functions [2019/11/09 23:40] (current) – [Binary cross entropy] phreazer | ||
---|---|---|---|
Line 6: | Line 6: | ||
$-(y log(\hat{y}) + (1-y) log(1-\hat{y})$ | $-(y log(\hat{y}) + (1-y) log(1-\hat{y})$ | ||
+ | |||
+ | or | ||
+ | |||
+ | $-1/N \sum_{i=1}^N (y_i log(\hat{y_i}) + (1-y_i) log(1-\hat{y_i})$ | ||
(Negative log, since log in [0,1] is < 0) | (Negative log, since log in [0,1] is < 0) | ||
Line 20: | Line 24: | ||
More classes, higher entropy | More classes, higher entropy | ||
+ | |||
+ | Recap cross-entropy: | ||
+ | |||
+ | $H_p(q) = - \sum_{c=1}^C q(y_c) * log(p(y_c))$ where p is other distribution | ||
+ | |||
+ | When $p == q$, then $H_p(q) == H(q)$, so cross-entropy >= entropy. | ||
+ | |||
+ | Recap KL-Divergence: | ||
+ | |||
+ | Is difference between cross-entropy and entropy. | ||
+ | |||
+ | $D_{KL}(q||p) = H_p(q) - H(q) = \sum_{c=1}^C q(y_c) (log(q(y_c) - log(p(y_c)))$ | ||
+ | |||
+ | For an algo we need to find closes $p$ for $q$ by minimizing cross-entropy. | ||
+ | |||
+ | In training we have $N$ samples. For one particular example the distribution is known to which class it belongs. The loss function should minimize the average cross-entropy. | ||
+ | |||
+ | Outcome: Scalar [0,1] using sigmoid | ||
+ | |||
+ | ===== Cross-entropy ===== | ||
+ | |||
+ | $-\sum_{i}^C(y_i log(\hat{y_i})$ | ||
+ | $C$ is number of classes | ||
+ | |||
+ | Outcome: Vector [0,1] using softmax | ||
+ | |||
+ | ===== Binary cross entropy with multi labels ===== | ||
+ | |||
+ | $-\sum_{i}^C(y_i log(\hat{y_i}) + (1-y_i) log(1-\hat{y_i})$ | ||
+ | |||
+ | Ouctome: Vector [0,1] using sigmoid | ||
+ |