Differences

This shows you the differences between two versions of the page.

--- data_mining:neural_network:short_overview [2018/04/22 00:15] – [Calculate predictions errors:] phreazer
+++ data_mining:neural_network:short_overview [2018/05/10 17:32] (current) – [Cost function] phreazer
@@ Line 96: / Line 96: @@
   * Sum over k: number of outputs
   * Sum over all $\theta_{ji}^{(l)}$ without bias units.
+  * Frobenius norm for regularization, also called //weight decay//
 ===== Backpropagation Algorithm =====
@@ Line 108: / Line 109: @@
 a^{(1)} = x \\
 z^{(2)} = \theta^{(1)} a^{(1)} \\
-a^{(2)} = g(z^{(2)}) \text{ füge } a_0^{(2)} \text{hinzu} \\
+a^{(2)} = g(z^{(2)}) \text{ add } a_0^{(2)} \\
 \dots
 $$
@@ Line 121: / Line 122: @@
 Vectorized: $\delta^{(4)} = a^{(4)} - y$
+$.*$ is element-wise multiplication
 $$\delta_j^{(3)} = (\theta^{(3)})^T\delta^{(4)}.*g'(z^{(3)}) \\
@@ Line 130: / Line 133: @@
 Algorithmus
-$$\Delta_{ij}^{(l)} = 0 \text{für alle i,j,l} \\
+$$\text{Set } \Delta_{ij}^{(l)} = 0 \text{ for all i,j,l} \\
 \text{For i=1 to m:} \\
-\text{Set} a^{(1)} = x^{(i)}$$
+\text{Set } a^{(1)} = x^{(i)}$$
 Forward propagation to compute $a^{(l)}$ für $l=2,3,\dots,L$