For an algo we need to find closes $p$ for $q$ by minimizing cross-entropy. | For an algo we need to find closes $p$ for $q$ by minimizing cross-entropy. | ||

+ | |||

+ | In training we have $N$ samples. For one particular example the distribution is known to which class it belongs. The loss function should minimize the average cross-entropy. |