Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:neural_network:short_overview [2018/04/21 21:46] – [Functions] phreazer | data_mining:neural_network:short_overview [2018/05/10 15:32] (current) – [Cost function] phreazer | ||
---|---|---|---|
Line 44: | Line 44: | ||
$$ | $$ | ||
- | ==== Functions ==== | + | ===== Functions |
Examples how NNs can learn AND, OR, XOR, XNOR functions. | Examples how NNs can learn AND, OR, XOR, XNOR functions. | ||
Line 60: | Line 60: | ||
from $X_1 AND x_2$, $NOT(x_1) AND NOT(x_2)$ and $x_1 OR x_2$ | from $X_1 AND x_2$, $NOT(x_1) AND NOT(x_2)$ and $x_1 OR x_2$ | ||
- | ==== Multi-Class | + | ===== Multi-Class |
+ | |||
+ | 4 output units for each class one | ||
- | Output für jeweilige Klasse (hier 4) | ||
$$h_\theta(x) \approx 1,0,0,0$$ | $$h_\theta(x) \approx 1,0,0,0$$ | ||
$$h_\theta(x) \approx 0,1,0,0$$ | $$h_\theta(x) \approx 0,1,0,0$$ | ||
Line 68: | Line 69: | ||
$$h_\theta(x) \approx 0,0,0,1$$ | $$h_\theta(x) \approx 0,0,0,1$$ | ||
- | Hier 4 Classifiers. | + | $y^{(i)}$ is now a 4 dim vector |
- | ==== Cost function ==== | + | ===== Cost function |
- | $s_l$: | + | Notation |
- | $L$: Gesamtzahl an Layer | + | * $s_l$: |
+ | | ||
+ | * For binary classification: | ||
+ | * For K-nary classification: | ||
- | Für binäre Klassifikation: | + | Generalization of cost function of logistic regression. |
- | Für m Klassifikation: | + | |
- | Verallgemeinerung der Kostenfunktion für logistische Regression. | + | Cost function of logistic regression: |
- | **Kostenfunktion eines Neuronalen Netzes:** | + | $J(\theta) = - \frac{1}{m} \sum_{i=1}^m -y^{(i)} |
- | $J(\theta) = - \frac{1}{m} [\sum_{i=1}^m \sum_{k=1}^K y_k^{(i)} log(h_\theta(x^{(i)}))_k + (1-y_k^{(i)}) log(1-(h_\theta(x^{(i)}))_k)] | + | $+ \frac{\lambda}{2m} \sum_{j=1}^{n} (\theta_{j})^2$ |
- | Erklärung: | + | **Cost function of a neural net:** |
- | Summe über k: Anzahl Outputs. | + | |
- | Summe über alle $\theta_{ji}^{(l)}$ ohne Bias Units. | + | |
- | ==== Backpropagation Algorithmus ==== | + | $J(\theta) |
- | Wir wollen | + | $+ \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_l+1} (\theta_{ji}^{(l)})^2$ |
+ | |||
+ | Explanation: | ||
+ | * Sum over k: number of outputs | ||
+ | * Sum over all $\theta_{ji}^{(l)}$ without bias units. | ||
+ | * Frobenius norm for regularization, | ||
+ | |||
+ | ===== Backpropagation Algorithm ===== | ||
+ | |||
+ | Goal is $\min_\theta J(\theta)$, needed parts: | ||
+ | * $J(\theta)$ | ||
+ | * Partial derivatives | ||
Forward propagation: | Forward propagation: | ||
Line 97: | Line 109: | ||
a^{(1)} = x \\ | a^{(1)} = x \\ | ||
z^{(2)} = \theta^{(1)} a^{(1)} \\ | z^{(2)} = \theta^{(1)} a^{(1)} \\ | ||
- | a^{(2)} = g(z^{(2)}) \text{ | + | a^{(2)} = g(z^{(2)}) \text{ |
\dots | \dots | ||
$$ | $$ | ||
- | $\delta_j^{(l)}$: | + | ==== Calculation of predictions errors ==== |
- | Für jede Outputeinheit | + | $\delta_j^{(l)}$: Error of unit $j$ in layer $l$. |
+ | |||
+ | For each output unit (layer | ||
$\delta_j^{(4)} = a_j^{(4)} - y_j$ | $\delta_j^{(4)} = a_j^{(4)} - y_j$ | ||
- | Vektorisiert: $\delta^{(4)} = a^{(4)} - y$ | + | Vectorized: $\delta^{(4)} = a^{(4)} - y$ |
+ | |||
+ | $.*$ is element-wise multiplication | ||
$$\delta_j^{(3)} = (\theta^{(3)})^T\delta^{(4)}.*g' | $$\delta_j^{(3)} = (\theta^{(3)})^T\delta^{(4)}.*g' | ||
Line 117: | Line 133: | ||
Algorithmus | Algorithmus | ||
- | $$\Delta_{ij}^{(l)} = 0 \text{für alle i,j,l} \\ | + | $$\text{Set } \Delta_{ij}^{(l)} = 0 \text{ |
\text{For i=1 to m:} \\ | \text{For i=1 to m:} \\ | ||
- | \text{Set} a^{(1)} = x^{(i)}$$ | + | \text{Set } a^{(1)} = x^{(i)}$$ |
Forward propagation to compute $a^{(l)}$ für $l=2, | Forward propagation to compute $a^{(l)}$ für $l=2, |