Differences

This shows you the differences between two versions of the page.

--- data_mining:neural_network:loss_functions [2019/11/10 00:04] – [Binary cross entropy] phreazer
+++ data_mining:neural_network:loss_functions [2019/11/10 00:40] (current) – [Binary cross entropy] phreazer
@@ Line 6: / Line 6: @@
 $-(y log(\hat{y}) + (1-y) log(1-\hat{y})$
+or
+$-1/N \sum_{i=1}^N (y_i log(\hat{y_i}) + (1-y_i) log(1-\hat{y_i})$
 (Negative log, since log in [0,1] is < 0)
@@ Line 32: / Line 36: @@
 $D_{KL}(q||p) = H_p(q) - H(q) = \sum_{c=1}^C q(y_c) (log(q(y_c) - log(p(y_c)))$
+For an algo we need to find closes $p$ for $q$ by minimizing cross-entropy.
+In training we have $N$ samples. For one particular example the distribution is known to which class it belongs. The loss function should minimize the average cross-entropy.
+Outcome: Scalar [0,1] using sigmoid
+===== Cross-entropy =====
+$-\sum_{i}^C(y_i log(\hat{y_i})$
+$C$ is number of classes
+Outcome: Vector [0,1] using softmax
+===== Binary cross entropy with multi labels =====
+$-\sum_{i}^C(y_i log(\hat{y_i}) + (1-y_i) log(1-\hat{y_i})$
+Ouctome: Vector [0,1] using sigmoid