Differences

This shows you the differences between two versions of the page.

--- data_mining:neural_network:short_overview [2018/04/22 00:00] – [Cost function] phreazer
+++ data_mining:neural_network:short_overview [2018/05/10 17:32] (current) – [Cost function] phreazer
@@ Line 89: / Line 89: @@
 **Cost function of a neural net:**
-$J(\theta) = - \frac{1}{m} [\sum_{i=1}^m \sum_{k=1}^K y_k^{(i)} log(h_\theta(x^{(i)}))_k + (1-y_k^{(i)}) log(1-(h_\theta(x^{(i)}))_k)] + \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_l+1} (\theta_{ji}^{(l)})^2$
+$J(\theta) = - \frac{1}{m} [\sum_{i=1}^m \sum_{k=1}^K y_k^{(i)} log(h_\theta(x^{(i)}))_k + (1-y_k^{(i)}) log(1-(h_\theta(x^{(i)}))_k)]$
+$+ \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_l+1} (\theta_{ji}^{(l)})^2$
 Explanation:
-Sum over k: number of outputs
+  * Sum over k: number of outputs
-Sum over all $\theta_{ji}^{(l)}$ without bias units.
+  * Sum over all $\theta_{ji}^{(l)}$ without bias units.
+  * Frobenius norm for regularization, also called //weight decay//
-==== Backpropagation Algorithmus ====
+===== Backpropagation Algorithm =====
-Wir wollen $\min_\theta J(\theta)$ und benötigen $J(\theta)$ und zugehörige partielle Ableitungen.
+Goal is $\min_\theta J(\theta)$, needed parts:
+  * $J(\theta)$
+  * Partial derivatives
 Forward propagation:
@@ Line 104: / Line 109: @@
 a^{(1)} = x \\
 z^{(2)} = \theta^{(1)} a^{(1)} \\
-a^{(2)} = g(z^{(2)}) \text{ füge } a_0^{(2)} \text{hinzu} \\
+a^{(2)} = g(z^{(2)}) \text{ add } a_0^{(2)} \\
 \dots
 $$
-$\delta_j^{(l)}$: Fehler von Knoten j in Layer l.
+==== Calculation of predictions errors ====
-Für jede Outputeinheit (Layer L=4)
+$\delta_j^{(l)}$: Error of unit $j$ in layer $l$.
+For each output unit (layer L=4)
 $\delta_j^{(4)} = a_j^{(4)} - y_j$
-Vektorisiert: $\delta^{(4)} = a^{(4)} - y$
+Vectorized: $\delta^{(4)} = a^{(4)} - y$
+$.*$ is element-wise multiplication
 $$\delta_j^{(3)} = (\theta^{(3)})^T\delta^{(4)}.*g'(z^{(3)}) \\
@@ Line 124: / Line 133: @@
 Algorithmus
-$$\Delta_{ij}^{(l)} = 0 \text{für alle i,j,l} \\
+$$\text{Set } \Delta_{ij}^{(l)} = 0 \text{ for all i,j,l} \\
 \text{For i=1 to m:} \\
-\text{Set} a^{(1)} = x^{(i)}$$
+\text{Set } a^{(1)} = x^{(i)}$$
 Forward propagation to compute $a^{(l)}$ für $l=2,3,\dots,L$