Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
data_mining:neural_network:hopfield [2017/04/09 10:36] – angelegt phreazer | data_mining:neural_network:hopfield [2017/04/09 11:10] (current) – [Boltzman machine] phreazer | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Hopfield nets ====== | ====== Hopfield nets ====== | ||
+ | Architecture: | ||
+ | * Recurrent connections | ||
+ | * Binary theshold units | ||
+ | |||
+ | Idea: | ||
+ | If connections are **symmetric**, | ||
+ | |||
+ | Each binary configuration has an energy. Binary threshold decision rule causes network to settle to a **minimum of this energy function**. | ||
+ | |||
+ | ===== Energy function ===== | ||
+ | **Global energy:** | ||
+ | |||
+ | $E = - \sum_i s_i b_i - \sum_{i< | ||
+ | |||
+ | * $w_ij$ is connection weight. | ||
+ | * Binary states of two neurons | ||
+ | |||
+ | Local computation for each unit: | ||
+ | |||
+ | $\Delta E_i = E(s_i=0) - E(s_i=1) = b_i \sum_j s_j w_{ij}$ | ||
+ | |||
+ | ===== Storing memories ===== | ||
+ | |||
+ | * memories could be energy minima of a neural net. | ||
+ | * Binary threshold decision rule can the nbe used to clean up incomplete or corrupted memories. | ||
+ | |||
+ | |||
+ | * Activities of 1 and -1 to store a binary state vector (by incrementing weight betweend any two units by the product of their activities). Bias is permantly on unit. | ||
+ | * $\Delta w_{ij} = s_i s_j$ | ||
+ | |||
+ | ===== Storage capacity ===== | ||
+ | N units = 0.15 N memories (At N bits per memory this is only 0.15N^2). | ||
+ | |||
+ | * Net has $N^2$ weights and biases. | ||
+ | * After storing $M$ memories, each connection weights has an integer value in the range of $[-M, M]$. | ||
+ | * Number of bits required to store weights and biases: $N^2 log(2M+1)$. | ||
+ | |||
+ | ===== Spurious minima limit capactiy ===== | ||
+ | When new configuration is memorizes, we hope to create a new energy minimum (if two minima merge, capacity decreases). | ||
+ | |||
+ | ==== Avoiding spurious minima by unlearning ==== | ||
+ | Let net settle from random initial state, then do unlearning. | ||
+ | |||
+ | Better storage rule: Instead of trying to store vectors in one shot, cycle through training set many times. (Pseudo likelihood technique). | ||
+ | |||
+ | ===== Hopfield nets with hidden units ===== | ||
+ | Instead of memories, store interpretations of sensory input. | ||
+ | |||
+ | * Input represented as visible units | ||
+ | * Intepretation represented as hidden units. | ||
+ | * Badness of interpreation rep. as energy. | ||
+ | |||
+ | Issues: | ||
+ | * How to avoid getting trapped in poor local minima of the energy function? | ||
+ | * How to learn the weights on the connections to the hidden units and between the hidden units? | ||
+ | |||
+ | |||
+ | ===== Improve search with stochastic units ===== | ||
+ | |||
+ | Hopfield net always reduces energy (trapped in local minima). | ||
+ | Random noise: | ||
+ | * Lot of noise, easy to cross barriers | ||
+ | * Slowly reduce noise so that system ends up in a deep minimum (Simulated annealing) | ||
+ | |||
+ | Stochastic binary units | ||
+ | |||
+ | * Replace binary threshold units with binary stochastic units that make biased random decisions (" | ||
+ | |||
+ | $p(s_i=1) = \frac{1}{1+e^{-\Delta E_i/T}}$ | ||
+ | |||
+ | ===== Thermal equilibrium ===== | ||
+ | Thermal equi. at temperature of 1 | ||
+ | |||
+ | Reaching thermal equilibrium is difficult concept. Probability distribution over configurations settles down to statonary distribution. | ||
+ | |||
+ | Intuitively: | ||
+ | |||
+ | Approaching equilibrium: | ||
+ | |||
+ | * Start with any distribution over all identical systems | ||
+ | * Apply stochastic update rule, to pick next configuration for each individual system | ||
+ | * May reach situation where fraction of systems in each configuration remains constant. | ||
+ | * This stationary distribution is called thermal equilibrium. | ||
+ | * Any given system keeps changing its configuration, | ||
+ | |||
+ | ===== Boltzman machine ===== | ||
+ | |||
+ | Given: Training set of binary vectors. Fit model that will assign a probability to every possible binary vector. | ||
+ | |||
+ | |||
+ | Useful for deciding if other binary vectors come from some distribution (e.g. to detect unusual behavious). |