data_mining:neural_network:short_overview

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:short_overview [2018/04/21 23:46] – [Functions] phreazerdata_mining:neural_network:short_overview [2018/05/10 17:32] (current) – [Cost function] phreazer
Line 44: Line 44:
 $$ $$
  
-==== Functions ====+===== Functions =====
  
 Examples how NNs can learn AND, OR, XOR, XNOR functions. Examples how NNs can learn AND, OR, XOR, XNOR functions.
Line 60: Line 60:
 from $X_1 AND x_2$, $NOT(x_1) AND NOT(x_2)$ and $x_1 OR x_2$ from $X_1 AND x_2$, $NOT(x_1) AND NOT(x_2)$ and $x_1 OR x_2$
  
-==== Multi-Class Classification ====+===== Multi-Class classification ===== 
 + 
 +4 output units for each class one
  
-Output für jeweilige Klasse (hier 4) 
 $$h_\theta(x) \approx 1,0,0,0$$ $$h_\theta(x) \approx 1,0,0,0$$
 $$h_\theta(x) \approx 0,1,0,0$$ $$h_\theta(x) \approx 0,1,0,0$$
Line 68: Line 69:
 $$h_\theta(x) \approx 0,0,0,1$$ $$h_\theta(x) \approx 0,0,0,1$$
  
-Hier Classifiers.+$y^{(i)}$ is now a dim vector
  
-==== Cost function ====+===== Cost function =====
  
-$s_l$: Anzahl der Einheiten (ohne Bias Unit) pro Layer +Notation 
-$L$: Gesamtzahl an Layer+  * $s_l$: Number of units without bias unit per layer $l$ 
 +  $L$: Total number of layers in network 
 +  * For binary classification: $S_L = 1; K=1$ 
 +  * For K-nary classification: $S_L = K;$
  
-Für binäre Klassifikation: $S_L = 1; K=1$ +Generalization of cost function of logistic regression.
-Für m Klassifikation: $S_L = K;$+
  
-Verallgemeinerung der Kostenfunktion für logistische Regression.+Cost function of logistic regression:
  
-**Kostenfunktion eines Neuronalen Netzes:**+$J(\theta) = - \frac{1}{m} \sum_{i=1}^m -y^{(i)} log(h_\theta(x^{(i)})) - (1-y^{(i)}) log(1 - h_\theta(x^{(i)}))$
  
-$J(\theta) = - \frac{1}{m} [\sum_{i=1}^m \sum_{k=1}^K y_k^{(i)} log(h_\theta(x^{(i)}))_k + (1-y_k^{(i)}) log(1-(h_\theta(x^{(i)}))_k)] + \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_l+1} (\theta_{ji}^{(l)})^2$+$+ \frac{\lambda}{2m} \sum_{j=1}^{n} (\theta_{j})^2$
  
-Erklärung: +**Cost function of a neural net:**
-Summe über k: Anzahl Outputs. +
-Summe über alle $\theta_{ji}^{(l)}$ ohne Bias Units.+
  
-==== Backpropagation Algorithmus ====+$J(\theta) - \frac{1}{m} [\sum_{i=1}^m \sum_{k=1}^K y_k^{(i)} log(h_\theta(x^{(i)}))_k + (1-y_k^{(i)}) log(1-(h_\theta(x^{(i)}))_k)]$
  
-Wir wollen $\min_\theta J(\theta)$ und benötigen $J(\theta)$ und zugehörige partielle Ableitungen.+$+ \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_l+1} (\theta_{ji}^{(l)})^2$ 
 + 
 +Explanation: 
 +  * Sum over k: number of outputs 
 +  * Sum over all $\theta_{ji}^{(l)}$ without bias units. 
 +  * Frobenius norm for regularization, also called //weight decay// 
 + 
 +===== Backpropagation Algorithm ===== 
 + 
 +Goal is $\min_\theta J(\theta)$, needed parts: 
 +  * $J(\theta)$ 
 +  * Partial derivatives
  
 Forward propagation: Forward propagation:
Line 97: Line 109:
 a^{(1)} = x \\ a^{(1)} = x \\
 z^{(2)} = \theta^{(1)} a^{(1)} \\ z^{(2)} = \theta^{(1)} a^{(1)} \\
-a^{(2)} = g(z^{(2)}) \text{ füge } a_0^{(2)} \text{hinzu} \\+a^{(2)} = g(z^{(2)}) \text{ add } a_0^{(2)} \\
 \dots \dots
 $$ $$
  
-$\delta_j^{(l)}$: Fehler von Knoten j in Layer l.+==== Calculation of predictions errors ====
  
-Für jede Outputeinheit (Layer L=4)+$\delta_j^{(l)}$: Error of unit $j$ in layer $l$. 
 + 
 +For each output unit (layer L=4)
  
 $\delta_j^{(4)} = a_j^{(4)} - y_j$ $\delta_j^{(4)} = a_j^{(4)} - y_j$
  
-Vektorisiert: $\delta^{(4)} = a^{(4)} - y$+Vectorized: $\delta^{(4)} = a^{(4)} - y$ 
 + 
 +$.*$ is element-wise multiplication
  
 $$\delta_j^{(3)} = (\theta^{(3)})^T\delta^{(4)}.*g'(z^{(3)}) \\ $$\delta_j^{(3)} = (\theta^{(3)})^T\delta^{(4)}.*g'(z^{(3)}) \\
Line 117: Line 133:
 Algorithmus Algorithmus
  
-$$\Delta_{ij}^{(l)} = 0 \text{für alle i,j,l} \\+$$\text{Set } \Delta_{ij}^{(l)} = 0 \text{ for all i,j,l} \\
 \text{For i=1 to m:} \\  \text{For i=1 to m:} \\ 
-\text{Set} a^{(1)} = x^{(i)}$$+\text{Set } a^{(1)} = x^{(i)}$$
  
 Forward propagation to compute $a^{(l)}$ für $l=2,3,\dots,L$ Forward propagation to compute $a^{(l)}$ für $l=2,3,\dots,L$
  • data_mining/neural_network/short_overview.1524347214.txt.gz
  • Last modified: 2018/04/21 23:46
  • by phreazer