Table of Contents

Biological neurons

Dendritic Tree

Axon

Spike generation

Synapses

Transmitter molecules diffuse accross the synaptic cleft and bind to receprot molecules in the membrane of the post-synaptic neuron (changing their shape). This opens holes that allo specific ions in or out → changes depolerization.

Effectivenes of synapses can be changed:

Synapses very small and very low power Adapt using locally available signals

10^11 Neurons with 10^4 weights (high bandwith)

Different bits of the cortex do different things. Specific tasks increase blood flow to specific regions.

Cortex is made of general purpose stuff that has the ability to turn into special purpose hardware in response to experience

Artificial neurons

Linear neurons

$y=b+\sum_{i} x_{i} w_{i}$

Squared error loss function:

$\frac{1}{2}(w^Tx_1-t_1)^2 + \frac{1}{2}(w^Tx_2-t_2)^2$

Error surface $E$ of $x_1=(1,-1)$ $x_2=(0,1)$ $t_1=0$ $t_2=1$.

Batch learning / online learning

Online Learning: Constraint planes for each training case. We move us perpendicular to one constraint (Zig zag). Learning can be slow, if eclipse is very elongated.

Binary threshold neurons

McCulloch - Pitts (1943)

  1. Compute weighted sum of inputs
  2. Send fixed size spike of activity if the weitghtes sum exceeds a threshold. Spike is like the bool value of a proposition and each neuron combines bool values to compute bool value of another proposition.
  3. Output 0 or 1

$z=b+\sum_{i} x_{i} w_{i}$

$y = \begin{cases} 1, & \text{if } z \geq 0 \\ 0, & \text{otherwhise}\end{cases}$

$\theta = -b$

$\theta$ : threshold

Rectified Linear Neurons

Aka ReLU (Rectified Linear Unit)

$z=b+\sum_{i} x_{i} w_{i}$

$y = \begin{cases} z, & \text{if } z > 0 \\ 0, & \text{otherwhise}\end{cases} = \max(0,z)$

Above 0, it is linear, at 0 it is 0

Faster computation, since slope doesn't get very small/large.

Leaky ReLU:

$y =\max(0.01 z,z)$

Sigmoid Neuron

Real-valued output, that is smooth and bounded function of their total input. Typically logistic function, have nice derivatives.

$z=b+\sum_{i} x_{i} w_{i}$

$y = \frac{1}{1+e^{-z}}$

$\text{lim}_{(z->-∞)} \frac{1}{1+e^{-z}} = 0$

$\text{lim}_{(z->∞)} \frac{1}{1+e^{-z}} = 1$

Switch from Sigmoid to ReLU lead to performance improvement (Slope of Sigmoid gradually shrinks to zero).

tanh

Works better than Sigmoid function.

$y = \frac{e^{z}-e^{-z}}{e^{z}+e^{-z}}$

Centering of data to 0.

Exception: Output layer, since output should be in [0,1].

Softmax group

Logistic function output is used for the classification between two target classes 0/1. Softmax function is generalized type of logistic function that can output a multiclass categorical probability distribution.

Derivates:

Compute derivative of the logit with respect to the inputs and the weights.

$\frac{\partial z}{\partial w_i} = x_i$

$\frac{\partial z}{\partial x_i} = w_i$

Compute derivates of the output with respect to the logit, if expressed through output $y$:

$\frac{\partial y}{\partial z} = y(1-y)$

Derivates of output with respect to each weight:

$\frac{\partial y}{\partial w_i} = \frac{\partial z}{\partial w_i} \frac{d y}{d z} = x_i y (1-y)$

$\frac{\partial E}{\partial w_i} = \sum_n \frac{\partial y^n}{\partial w_i} \frac{d E}{d y^n} = - \sum_n x_i^n y^n (1-y^n) (t^n -y^n)$

Problems with squared error.

Better: Force output to represent a probability distributions

Solution: Softmax / Softmax group

$y_i=\frac{e^{z_i}}{\Sigma_{j \in group} e^{z_j}}$

$\frac{\partial y_i}{\partial z_i}=y_i (1- y_i)$

Stochastic binary neurons

Treat output of the logistic as probability of producing a spike in a short time window

$z=b+\sum_{i} x_{i} w_{i}$

$p(s=1) = \frac{1}{1+e^{-z}}$

Output 0 or 1.

Also possible for rectified linear units: Output is treated as the poisson rate for spikes.