====== Biological neurons ====== Dendritic Tree * Collects input from other neurons Axon * Branches * Contact dendritic trees at synapses Spike generation * Axon hillock that generates outgoing spikes whenever enough charge has flowed in at synapses to depolarize the cell members Synapses * When Spike of activity travels along an axon and arrives at a synapse : Vesicles of transmitter chemicals to be released (several kinds of transmitter (positive and negative weights) Transmitter molecules diffuse accross the synaptic cleft and bind to receprot molecules in the membrane of the post-synaptic neuron (changing their shape). This opens holes that allo specific ions in or out -> changes depolerization. Effectivenes of synapses can be changed: * Different number of vesicles of transmitter * Different number of receptor molecules Synapses very small and very low power Adapt using locally available signals **10^11 Neurons** with **10^4 weights** (high bandwith) Different bits of the cortex do different things. Specific tasks increase blood flow to specific regions. Cortex is made of general purpose stuff that has the ability to turn into special purpose hardware in response to experience * Early brain damages makes functions relocate. ====== Artificial neurons ====== ===== Linear neurons ===== $y=b+\sum_{i} x_{i} w_{i}$ * y: output * b: bias * $x_{i}$: i-th input * $w_{i}$: weight on i-th input Squared error loss function: $\frac{1}{2}(w^Tx_1-t_1)^2 + \frac{1}{2}(w^Tx_2-t_2)^2$ Error surface $E$ of $x_1=(1,-1)$ $x_2=(0,1)$ $t_1=0$ $t_2=1$. Batch learning / online learning Online Learning: Constraint planes for each training case. We move us perpendicular to one constraint (Zig zag). Learning can be slow, if eclipse is very elongated. ===== Binary threshold neurons ===== McCulloch - Pitts (1943) - Compute weighted sum of inputs - Send fixed size spike of activity if the weitghtes sum exceeds a threshold. Spike is like the bool value of a proposition and each neuron combines bool values to compute bool value of another proposition. - Output 0 or 1 $z=b+\sum_{i} x_{i} w_{i}$ $y = \begin{cases} 1, & \text{if } z \geq 0 \\ 0, & \text{otherwhise}\end{cases}$ $\theta = -b$ $\theta$ : threshold ===== Rectified Linear Neurons ===== Aka ReLU (Rectified Linear Unit) $z=b+\sum_{i} x_{i} w_{i}$ $y = \begin{cases} z, & \text{if } z > 0 \\ 0, & \text{otherwhise}\end{cases} = \max(0,z)$ Above 0, it is linear, at 0 it is 0 Faster computation, since slope doesn't get very small/large. Leaky ReLU: $y =\max(0.01 z,z)$ ===== Sigmoid Neuron ===== Real-valued output, that is smooth and bounded function of their total input. Typically logistic function, have nice derivatives. $z=b+\sum_{i} x_{i} w_{i}$ $y = \frac{1}{1+e^{-z}}$ $\text{lim}_{(z->-∞)} \frac{1}{1+e^{-z}} = 0$ $\text{lim}_{(z->∞)} \frac{1}{1+e^{-z}} = 1$ Switch from Sigmoid to ReLU lead to performance improvement (Slope of Sigmoid gradually shrinks to zero). ===== tanh ===== Works better than Sigmoid function. $y = \frac{e^{z}-e^{-z}}{e^{z}+e^{-z}}$ Centering of data to 0. Exception: Output layer, since output should be in [0,1]. ===== Softmax group ===== Logistic function output is used for the classification between two target classes 0/1. Softmax function is generalized type of logistic function that can output a **multiclass** categorical **probability distribution**. Derivates: Compute derivative of the logit with respect to the inputs and the weights. $\frac{\partial z}{\partial w_i} = x_i$ $\frac{\partial z}{\partial x_i} = w_i$ Compute derivates of the output with respect to the logit, if expressed through output $y$: $\frac{\partial y}{\partial z} = y(1-y)$ Derivates of output with respect to each weight: $\frac{\partial y}{\partial w_i} = \frac{\partial z}{\partial w_i} \frac{d y}{d z} = x_i y (1-y)$ $\frac{\partial E}{\partial w_i} = \sum_n \frac{\partial y^n}{\partial w_i} \frac{d E}{d y^n} = - \sum_n x_i^n y^n (1-y^n) (t^n -y^n)$ Problems with squared error. * If desired output = 1 and actual output = 0.0000001, almost no gradient to fix up the error * If probabilites are assigned to mutually exclusive class labels, output should sum up to 1 (network should know this info). Better: Force output to represent a probability distributions Solution: Softmax / Softmax group $y_i=\frac{e^{z_i}}{\Sigma_{j \in group} e^{z_j}}$ $\frac{\partial y_i}{\partial z_i}=y_i (1- y_i)$ ===== Stochastic binary neurons ===== Treat output of the logistic as probability of producing a spike in a short time window $z=b+\sum_{i} x_{i} w_{i}$ $p(s=1) = \frac{1}{1+e^{-z}}$ Output 0 or 1. Also possible for rectified linear units: Output is treated as the poisson rate for spikes.