Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:neural_network:short_overview [2018/04/21 22:02] – [Backpropagation Algorithmus] phreazer | data_mining:neural_network:short_overview [2018/05/10 15:32] (current) – [Cost function] phreazer | ||
---|---|---|---|
Line 94: | Line 94: | ||
Explanation: | Explanation: | ||
- | Sum over k: number of outputs | + | * Sum over k: number of outputs |
- | Sum over all $\theta_{ji}^{(l)}$ without bias units. | + | |
+ | * Frobenius norm for regularization, | ||
- | ===== Backpropagation | + | ===== Backpropagation |
- | Wir wollen | + | Goal is $\min_\theta J(\theta)$, needed parts: |
+ | * $J(\theta)$ | ||
+ | * Partial derivatives | ||
Forward propagation: | Forward propagation: | ||
Line 106: | Line 109: | ||
a^{(1)} = x \\ | a^{(1)} = x \\ | ||
z^{(2)} = \theta^{(1)} a^{(1)} \\ | z^{(2)} = \theta^{(1)} a^{(1)} \\ | ||
- | a^{(2)} = g(z^{(2)}) \text{ | + | a^{(2)} = g(z^{(2)}) \text{ |
\dots | \dots | ||
$$ | $$ | ||
- | $\delta_j^{(l)}$: | + | ==== Calculation of predictions errors ==== |
- | Für jede Outputeinheit | + | $\delta_j^{(l)}$: Error of unit $j$ in layer $l$. |
+ | |||
+ | For each output unit (layer | ||
$\delta_j^{(4)} = a_j^{(4)} - y_j$ | $\delta_j^{(4)} = a_j^{(4)} - y_j$ | ||
- | Vektorisiert: $\delta^{(4)} = a^{(4)} - y$ | + | Vectorized: $\delta^{(4)} = a^{(4)} - y$ |
+ | |||
+ | $.*$ is element-wise multiplication | ||
$$\delta_j^{(3)} = (\theta^{(3)})^T\delta^{(4)}.*g' | $$\delta_j^{(3)} = (\theta^{(3)})^T\delta^{(4)}.*g' | ||
Line 126: | Line 133: | ||
Algorithmus | Algorithmus | ||
- | $$\Delta_{ij}^{(l)} = 0 \text{für alle i,j,l} \\ | + | $$\text{Set } \Delta_{ij}^{(l)} = 0 \text{ |
\text{For i=1 to m:} \\ | \text{For i=1 to m:} \\ | ||
- | \text{Set} a^{(1)} = x^{(i)}$$ | + | \text{Set } a^{(1)} = x^{(i)}$$ |
Forward propagation to compute $a^{(l)}$ für $l=2, | Forward propagation to compute $a^{(l)}$ für $l=2, |