Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:neural_network:hopfield [2017/04/09 10:49] – [Energy function] phreazer | data_mining:neural_network:hopfield [2017/04/09 11:10] (current) – [Boltzman machine] phreazer | ||
---|---|---|---|
Line 37: | Line 37: | ||
* After storing $M$ memories, each connection weights has an integer value in the range of $[-M, M]$. | * After storing $M$ memories, each connection weights has an integer value in the range of $[-M, M]$. | ||
* Number of bits required to store weights and biases: $N^2 log(2M+1)$. | * Number of bits required to store weights and biases: $N^2 log(2M+1)$. | ||
+ | |||
+ | ===== Spurious minima limit capactiy ===== | ||
+ | When new configuration is memorizes, we hope to create a new energy minimum (if two minima merge, capacity decreases). | ||
+ | |||
+ | ==== Avoiding spurious minima by unlearning ==== | ||
+ | Let net settle from random initial state, then do unlearning. | ||
+ | |||
+ | Better storage rule: Instead of trying to store vectors in one shot, cycle through training set many times. (Pseudo likelihood technique). | ||
+ | |||
+ | ===== Hopfield nets with hidden units ===== | ||
+ | Instead of memories, store interpretations of sensory input. | ||
+ | |||
+ | * Input represented as visible units | ||
+ | * Intepretation represented as hidden units. | ||
+ | * Badness of interpreation rep. as energy. | ||
+ | |||
+ | Issues: | ||
+ | * How to avoid getting trapped in poor local minima of the energy function? | ||
+ | * How to learn the weights on the connections to the hidden units and between the hidden units? | ||
+ | |||
+ | |||
+ | ===== Improve search with stochastic units ===== | ||
+ | |||
+ | Hopfield net always reduces energy (trapped in local minima). | ||
+ | Random noise: | ||
+ | * Lot of noise, easy to cross barriers | ||
+ | * Slowly reduce noise so that system ends up in a deep minimum (Simulated annealing) | ||
+ | |||
+ | Stochastic binary units | ||
+ | |||
+ | * Replace binary threshold units with binary stochastic units that make biased random decisions (" | ||
+ | |||
+ | $p(s_i=1) = \frac{1}{1+e^{-\Delta E_i/T}}$ | ||
+ | |||
+ | ===== Thermal equilibrium ===== | ||
+ | Thermal equi. at temperature of 1 | ||
+ | |||
+ | Reaching thermal equilibrium is difficult concept. Probability distribution over configurations settles down to statonary distribution. | ||
+ | |||
+ | Intuitively: | ||
+ | |||
+ | Approaching equilibrium: | ||
+ | |||
+ | * Start with any distribution over all identical systems | ||
+ | * Apply stochastic update rule, to pick next configuration for each individual system | ||
+ | * May reach situation where fraction of systems in each configuration remains constant. | ||
+ | * This stationary distribution is called thermal equilibrium. | ||
+ | * Any given system keeps changing its configuration, | ||
+ | |||
+ | ===== Boltzman machine ===== | ||
+ | |||
+ | Given: Training set of binary vectors. Fit model that will assign a probability to every possible binary vector. | ||
+ | |||
+ | |||
+ | Useful for deciding if other binary vectors come from some distribution (e.g. to detect unusual behavious). |