Differences

This shows you the differences between two versions of the page.

--- data_mining:neural_network:hopfield [2017/04/09 10:49] – [Energy function] phreazer
+++ data_mining:neural_network:hopfield [2017/04/09 11:10] (current) – [Boltzman machine] phreazer
@@ Line 37: / Line 37: @@
   * After storing $M$ memories, each connection weights has an integer value in the range of $[-M, M]$.
   * Number of bits required to store weights and biases: $N^2 log(2M+1)$.
+===== Spurious minima limit capactiy =====
+When new configuration is memorizes, we hope to create a new energy minimum (if two minima merge, capacity decreases).
+==== Avoiding spurious minima by unlearning ====
+Let net settle from random initial state, then do unlearning.
+Better storage rule: Instead of trying to store vectors in one shot, cycle through training set many times. (Pseudo likelihood technique).
+===== Hopfield nets with hidden units =====
+Instead of memories, store interpretations of sensory input.
+  * Input represented as visible units
+  * Intepretation represented as hidden units.
+  * Badness of interpreation rep. as energy.
+Issues:
+  * How to avoid getting trapped in poor local minima of the energy function?
+  * How to learn the weights on the connections to the hidden units and between the hidden units?
+===== Improve search with stochastic units =====
+Hopfield net always reduces energy (trapped in local minima).
+Random noise:
+  * Lot of noise, easy to cross barriers
+  * Slowly reduce noise so that system ends up in a deep minimum (Simulated annealing)
+Stochastic binary units
+* Replace binary threshold units with binary stochastic units that make biased random decisions ("temperature" controls noise amount; raising noise is equivalen to decreasing all the energy gaps betweend configurations)
+$p(s_i=1) = \frac{1}{1+e^{-\Delta E_i/T}}$
+===== Thermal equilibrium =====
+Thermal equi. at temperature of 1
+Reaching thermal equilibrium is difficult concept. Probability distribution over configurations settles down to statonary distribution.
+Intuitively: Huge ensemble of systems that have same energy function. Probabiltiy of configuration is just fraction of the systems that have configuration.
+Approaching equilibrium:
+* Start with any distribution over all identical systems
+* Apply stochastic update rule, to pick next configuration for each individual system
+* May reach situation where fraction of systems in each configuration remains constant.
+  * This stationary distribution is called thermal equilibrium.
+  * Any given system keeps changing its configuration, but the fraction of systems in each configuration does not change.
+===== Boltzman machine =====
+Given: Training set of binary vectors. Fit model that will assign a probability to every possible binary vector.
+Useful for deciding if other binary vectors come from some distribution (e.g. to detect unusual behavious).