Dendritic Tree
Axon
Spike generation
Synapses
Transmitter molecules diffuse accross the synaptic cleft and bind to receprot molecules in the membrane of the post-synaptic neuron (changing their shape). This opens holes that allo specific ions in or out → changes depolerization.
Effectivenes of synapses can be changed:
Synapses very small and very low power Adapt using locally available signals
10^11 Neurons with 10^4 weights (high bandwith)
Different bits of the cortex do different things. Specific tasks increase blood flow to specific regions.
Cortex is made of general purpose stuff that has the ability to turn into special purpose hardware in response to experience
$y=b+\sum_{i} x_{i} w_{i}$
Squared error loss function:
$\frac{1}{2}(w^Tx_1-t_1)^2 + \frac{1}{2}(w^Tx_2-t_2)^2$
Error surface $E$ of $x_1=(1,-1)$ $x_2=(0,1)$ $t_1=0$ $t_2=1$.
Batch learning / online learning
Online Learning: Constraint planes for each training case. We move us perpendicular to one constraint (Zig zag). Learning can be slow, if eclipse is very elongated.
McCulloch - Pitts (1943)
$z=b+\sum_{i} x_{i} w_{i}$
$y = \begin{cases} 1, & \text{if } z \geq 0 \\ 0, & \text{otherwhise}\end{cases}$
$\theta = -b$
$\theta$ : threshold
Aka ReLU (Rectified Linear Unit)
$z=b+\sum_{i} x_{i} w_{i}$
$y = \begin{cases} z, & \text{if } z > 0 \\ 0, & \text{otherwhise}\end{cases} = \max(0,z)$
Above 0, it is linear, at 0 it is 0
Faster computation, since slope doesn't get very small/large.
Leaky ReLU:
$y =\max(0.01 z,z)$
Real-valued output, that is smooth and bounded function of their total input. Typically logistic function, have nice derivatives.
$z=b+\sum_{i} x_{i} w_{i}$
$y = \frac{1}{1+e^{-z}}$
$\text{lim}_{(z->-∞)} \frac{1}{1+e^{-z}} = 0$
$\text{lim}_{(z->∞)} \frac{1}{1+e^{-z}} = 1$
Switch from Sigmoid to ReLU lead to performance improvement (Slope of Sigmoid gradually shrinks to zero).
Works better than Sigmoid function.
$y = \frac{e^{z}-e^{-z}}{e^{z}+e^{-z}}$
Centering of data to 0.
Exception: Output layer, since output should be in [0,1].
Logistic function output is used for the classification between two target classes 0/1. Softmax function is generalized type of logistic function that can output a multiclass categorical probability distribution.
Derivates:
Compute derivative of the logit with respect to the inputs and the weights.
$\frac{\partial z}{\partial w_i} = x_i$
$\frac{\partial z}{\partial x_i} = w_i$
Compute derivates of the output with respect to the logit, if expressed through output $y$:
$\frac{\partial y}{\partial z} = y(1-y)$
Derivates of output with respect to each weight:
$\frac{\partial y}{\partial w_i} = \frac{\partial z}{\partial w_i} \frac{d y}{d z} = x_i y (1-y)$
$\frac{\partial E}{\partial w_i} = \sum_n \frac{\partial y^n}{\partial w_i} \frac{d E}{d y^n} = - \sum_n x_i^n y^n (1-y^n) (t^n -y^n)$
Problems with squared error.
Better: Force output to represent a probability distributions
Solution: Softmax / Softmax group
$y_i=\frac{e^{z_i}}{\Sigma_{j \in group} e^{z_j}}$
$\frac{\partial y_i}{\partial z_i}=y_i (1- y_i)$
Treat output of the logistic as probability of producing a spike in a short time window
$z=b+\sum_{i} x_{i} w_{i}$
$p(s=1) = \frac{1}{1+e^{-z}}$
Output 0 or 1.
Also possible for rectified linear units: Output is treated as the poisson rate for spikes.