Table of Contents

Hopfield nets

Architecture:

Idea: If connections are symmetric, there is a global energy function.

Each binary configuration has an energy. Binary threshold decision rule causes network to settle to a minimum of this energy function.

Energy function

Global energy:

$E = - \sum_i s_i b_i - \sum_{i<j} s_i s_j w_ij$

Local computation for each unit:

$\Delta E_i = E(s_i=0) - E(s_i=1) = b_i \sum_j s_j w_{ij}$

Storing memories

* memories could be energy minima of a neural net. * Binary threshold decision rule can the nbe used to clean up incomplete or corrupted memories.

Storage capacity

N units = 0.15 N memories (At N bits per memory this is only 0.15N^2).

Spurious minima limit capactiy

When new configuration is memorizes, we hope to create a new energy minimum (if two minima merge, capacity decreases).

Avoiding spurious minima by unlearning

Let net settle from random initial state, then do unlearning.

Better storage rule: Instead of trying to store vectors in one shot, cycle through training set many times. (Pseudo likelihood technique).

Hopfield nets with hidden units

Instead of memories, store interpretations of sensory input.

Issues:

Improve search with stochastic units

Hopfield net always reduces energy (trapped in local minima). Random noise:

Stochastic binary units

* Replace binary threshold units with binary stochastic units that make biased random decisions (“temperature” controls noise amount; raising noise is equivalen to decreasing all the energy gaps betweend configurations)

$p(s_i=1) = \frac{1}{1+e^{-\Delta E_i/T}}$

Thermal equilibrium

Thermal equi. at temperature of 1

Reaching thermal equilibrium is difficult concept. Probability distribution over configurations settles down to statonary distribution.

Intuitively: Huge ensemble of systems that have same energy function. Probabiltiy of configuration is just fraction of the systems that have configuration.

Approaching equilibrium:

* Start with any distribution over all identical systems * Apply stochastic update rule, to pick next configuration for each individual system * May reach situation where fraction of systems in each configuration remains constant.

Boltzman machine

Given: Training set of binary vectors. Fit model that will assign a probability to every possible binary vector.

Useful for deciding if other binary vectors come from some distribution (e.g. to detect unusual behavious).