Architecture:
Idea: If connections are symmetric, there is a global energy function.
Each binary configuration has an energy. Binary threshold decision rule causes network to settle to a minimum of this energy function.
Global energy:
$E = - \sum_i s_i b_i - \sum_{i<j} s_i s_j w_ij$
Local computation for each unit:
$\Delta E_i = E(s_i=0) - E(s_i=1) = b_i \sum_j s_j w_{ij}$
* memories could be energy minima of a neural net. * Binary threshold decision rule can the nbe used to clean up incomplete or corrupted memories.
N units = 0.15 N memories (At N bits per memory this is only 0.15N^2).
When new configuration is memorizes, we hope to create a new energy minimum (if two minima merge, capacity decreases).
Let net settle from random initial state, then do unlearning.
Better storage rule: Instead of trying to store vectors in one shot, cycle through training set many times. (Pseudo likelihood technique).
Instead of memories, store interpretations of sensory input.
Issues:
Hopfield net always reduces energy (trapped in local minima). Random noise:
Stochastic binary units
* Replace binary threshold units with binary stochastic units that make biased random decisions (“temperature” controls noise amount; raising noise is equivalen to decreasing all the energy gaps betweend configurations)
$p(s_i=1) = \frac{1}{1+e^{-\Delta E_i/T}}$
Thermal equi. at temperature of 1
Reaching thermal equilibrium is difficult concept. Probability distribution over configurations settles down to statonary distribution.
Intuitively: Huge ensemble of systems that have same energy function. Probabiltiy of configuration is just fraction of the systems that have configuration.
Approaching equilibrium:
* Start with any distribution over all identical systems * Apply stochastic update rule, to pick next configuration for each individual system * May reach situation where fraction of systems in each configuration remains constant.
Given: Training set of binary vectors. Fit model that will assign a probability to every possible binary vector.
Useful for deciding if other binary vectors come from some distribution (e.g. to detect unusual behavious).