data_mining:neural_network:neurons

# Biological neurons

Dendritic Tree

• Collects input from other neurons

Axon

• Branches
• Contact dendritic trees at synapses

Spike generation

• Axon hillock that generates outgoing spikes whenever enough charge has flowed in at synapses to depolarize the cell members

Synapses

• When Spike of activity travels along an axon and arrives at a synapse : Vesicles of transmitter chemicals to be released (several kinds of transmitter (positive and negative weights)

Transmitter molecules diffuse accross the synaptic cleft and bind to receprot molecules in the membrane of the post-synaptic neuron (changing their shape). This opens holes that allo specific ions in or out → changes depolerization.

Effectivenes of synapses can be changed:

• Different number of vesicles of transmitter
• Different number of receptor molecules

Synapses very small and very low power Adapt using locally available signals

10^11 Neurons with 10^4 weights (high bandwith)

Different bits of the cortex do different things. Specific tasks increase blood flow to specific regions.

Cortex is made of general purpose stuff that has the ability to turn into special purpose hardware in response to experience

• Early brain damages makes functions relocate.

# Artificial neurons

$y=b+\sum_{i} x_{i} w_{i}$

• y: output
• b: bias
• $x_{i}$: i-th input
• $w_{i}$: weight on i-th input

Squared error loss function:

$\frac{1}{2}(w^Tx_1-t_1)^2 + \frac{1}{2}(w^Tx_2-t_2)^2$

Error surface $E$ of $x_1=(1,-1)$ $x_2=(0,1)$ $t_1=0$ $t_2=1$.

Batch learning / online learning

Online Learning: Constraint planes for each training case. We move us perpendicular to one constraint (Zig zag). Learning can be slow, if eclipse is very elongated.

McCulloch - Pitts (1943)

1. Compute weighted sum of inputs
2. Send fixed size spike of activity if the weitghtes sum exceeds a threshold. Spike is like the bool value of a proposition and each neuron combines bool values to compute bool value of another proposition.
3. Output 0 or 1

$z=b+\sum_{i} x_{i} w_{i}$

$y = \begin{cases} 1, & \text{if } z \geq 0 \\ 0, & \text{otherwhise}\end{cases}$

$\theta = -b$

$\theta$ : threshold

Aka ReLU (Rectified Linear Unit)

$z=b+\sum_{i} x_{i} w_{i}$

$y = \begin{cases} z, & \text{if } z > 0 \\ 0, & \text{otherwhise}\end{cases} = \max(0,z)$

Above 0, it is linear, at 0 it is 0

Faster computation, since slope doesn't get very small/large.

Leaky ReLU:

$y =\max(0.01 z,z)$

Real-valued output, that is smooth and bounded function of their total input. Typically logistic function, have nice derivatives.

$z=b+\sum_{i} x_{i} w_{i}$

$y = \frac{1}{1+e^{-z}}$

$\text{lim}_{(z->-∞)} \frac{1}{1+e^{-z}} = 0$

$\text{lim}_{(z->∞)} \frac{1}{1+e^{-z}} = 1$

Switch from Sigmoid to ReLU lead to performance improvement (Slope of Sigmoid gradually shrinks to zero).

Works better than Sigmoid function.

$y = \frac{e^{z}-e^{-z}}{e^{z}+e^{-z}}$

Centering of data to 0.

Exception: Output layer, since output should be in [0,1].

Logistic function output is used for the classification between two target classes 0/1. Softmax function is generalized type of logistic function that can output a multiclass categorical probability distribution.

Derivates:

Compute derivative of the logit with respect to the inputs and the weights.

$\frac{\partial z}{\partial w_i} = x_i$

$\frac{\partial z}{\partial x_i} = w_i$

Compute derivates of the output with respect to the logit, if expressed through output $y$:

$\frac{\partial y}{\partial z} = y(1-y)$

Derivates of output with respect to each weight:

$\frac{\partial y}{\partial w_i} = \frac{\partial z}{\partial w_i} \frac{d y}{d z} = x_i y (1-y)$

$\frac{\partial E}{\partial w_i} = \sum_n \frac{\partial y^n}{\partial w_i} \frac{d E}{d y^n} = - \sum_n x_i^n y^n (1-y^n) (t^n -y^n)$

Problems with squared error.

• If desired output = 1 and actual output = 0.0000001, almost no gradient to fix up the error
• If probabilites are assigned to mutually exclusive class labels, output should sum up to 1 (network should know this info).

Better: Force output to represent a probability distributions

Solution: Softmax / Softmax group

$y_i=\frac{e^{z_i}}{\Sigma_{j \in group} e^{z_j}}$

$\frac{\partial y_i}{\partial z_i}=y_i (1- y_i)$

Treat output of the logistic as probability of producing a spike in a short time window

$z=b+\sum_{i} x_{i} w_{i}$

$p(s=1) = \frac{1}{1+e^{-z}}$

Output 0 or 1.

Also possible for rectified linear units: Output is treated as the poisson rate for spikes.

• data_mining/neural_network/neurons.txt