Hopfield nets
Architecture:
- Recurrent connections
- Binary theshold units
Idea: If connections are symmetric, there is a global energy function.
Each binary configuration has an energy. Binary threshold decision rule causes network to settle to a minimum of this energy function.
Energy function
Global energy:
$E = - \sum_i s_i b_i - \sum_{i<j} s_i s_j w_ij$
- $w_ij$ is connection weight.
- Binary states of two neurons
Local computation for each unit:
$\Delta E_i = E(s_i=0) - E(s_i=1) = b_i \sum_j s_j w_{ij}$
Storing memories
* memories could be energy minima of a neural net. * Binary threshold decision rule can the nbe used to clean up incomplete or corrupted memories.
- Activities of 1 and -1 to store a binary state vector (by incrementing weight betweend any two units by the product of their activities). Bias is permantly on unit.
- $\Delta w_{ij} = s_i s_j$
Storage capacity
N units = 0.15 N memories (At N bits per memory this is only 0.15N^2).
- Net has $N^2$ weights and biases.
- After storing $M$ memories, each connection weights has an integer value in the range of $[-M, M]$.
- Number of bits required to store weights and biases: $N^2 log(2M+1)$.
Spurious minima limit capactiy
When new configuration is memorizes, we hope to create a new energy minimum (if two minima merge, capacity decreases).
Avoiding spurious minima by unlearning
Let net settle from random initial state, then do unlearning.
Better storage rule: Instead of trying to store vectors in one shot, cycle through training set many times. (Pseudo likelihood technique).
Hopfield nets with hidden units
Instead of memories, store interpretations of sensory input.
- Input represented as visible units
- Intepretation represented as hidden units.
- Badness of interpreation rep. as energy.
Issues:
- How to avoid getting trapped in poor local minima of the energy function?
- How to learn the weights on the connections to the hidden units and between the hidden units?
Improve search with stochastic units
Hopfield net always reduces energy (trapped in local minima). Random noise:
- Lot of noise, easy to cross barriers
- Slowly reduce noise so that system ends up in a deep minimum (Simulated annealing)
Stochastic binary units
* Replace binary threshold units with binary stochastic units that make biased random decisions (“temperature” controls noise amount; raising noise is equivalen to decreasing all the energy gaps betweend configurations)
$p(s_i=1) = \frac{1}{1+e^{-\Delta E_i/T}}$
Thermal equilibrium
Thermal equi. at temperature of 1
Reaching thermal equilibrium is difficult concept. Probability distribution over configurations settles down to statonary distribution.
Intuitively: Huge ensemble of systems that have same energy function. Probabiltiy of configuration is just fraction of the systems that have configuration.
Approaching equilibrium:
* Start with any distribution over all identical systems * Apply stochastic update rule, to pick next configuration for each individual system * May reach situation where fraction of systems in each configuration remains constant.
- This stationary distribution is called thermal equilibrium.
- Any given system keeps changing its configuration, but the fraction of systems in each configuration does not change.
Boltzman machine
Given: Training set of binary vectors. Fit model that will assign a probability to every possible binary vector.
Useful for deciding if other binary vectors come from some distribution (e.g. to detect unusual behavious).