In training we have $N$ samples. For one particular example the distribution is known to which class it belongs. The loss function should minimize the average cross-entropy. | In training we have $N$ samples. For one particular example the distribution is known to which class it belongs. The loss function should minimize the average cross-entropy. | ||



+ | Outcome: Scalar [0,1] using sigmoid | ||



+ | ===== Cross-entropy ===== | ||



+ | $-\sum_{i}^C(y_i log(\hat{y_i})$ | ||

+ | $C$ is number of classes | ||



+ | Outcome: Vector [0,1] using softmax | ||



+ | ===== Binary cross entropy with multi labels ===== | ||



+ | $-\sum_{i}^C(y_i log(\hat{y_i}) + (1-y_i) log(1-\hat{y_i})$ | ||



+ | Ouctome: Vector [0,1] using sigmoid | ||

+ |