data_mining:neural_network:hopfield

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:hopfield [2017/04/09 10:43] – [Energy function] phreazerdata_mining:neural_network:hopfield [2017/04/09 11:10] (current) – [Boltzman machine] phreazer
Line 20: Line 20:
 Local computation for each unit: Local computation for each unit:
  
-$\delta E_i = E(s_i=0) - E(s_i=1) = b_i \sum_j s_j w_{ij}$+$\Delta E_i = E(s_i=0) - E(s_i=1) = b_i \sum_j s_j w_{ij}$ 
 + 
 +===== Storing memories ===== 
 + 
 + * memories could be energy minima of a neural net. 
 + * Binary threshold decision rule can the nbe used to clean up incomplete or corrupted memories. 
 + 
 + 
 +  * Activities of 1 and -1 to store a binary state vector (by incrementing weight betweend any two units by the product of their activities). Bias is permantly on unit. 
 +  * $\Delta w_{ij} = s_i s_j$ 
 + 
 +===== Storage capacity ===== 
 +N units = 0.15 N memories (At N bits per memory this is only 0.15N^2). 
 + 
 +  * Net has $N^2$ weights and biases. 
 +  * After storing $M$ memories, each connection weights has an integer value in the range of $[-M, M]$. 
 +  * Number of bits required to store weights and biases: $N^2 log(2M+1)$. 
 + 
 +===== Spurious minima limit capactiy ===== 
 +When new configuration is memorizes, we hope to create a new energy minimum (if two minima merge, capacity decreases). 
 + 
 +==== Avoiding spurious minima by unlearning ==== 
 +Let net settle from random initial state, then do unlearning. 
 + 
 +Better storage rule: Instead of trying to store vectors in one shot, cycle through training set many times. (Pseudo likelihood technique). 
 + 
 +===== Hopfield nets with hidden units ===== 
 +Instead of memories, store interpretations of sensory input. 
 + 
 +  * Input represented as visible units 
 +  * Intepretation represented as hidden units. 
 +  * Badness of interpreation rep. as energy. 
 + 
 +Issues: 
 +  * How to avoid getting trapped in poor local minima of the energy function?  
 +  * How to learn the weights on the connections to the hidden units and between the hidden units?  
 + 
 + 
 +===== Improve search with stochastic units ===== 
 + 
 +Hopfield net always reduces energy (trapped in local minima). 
 +Random noise: 
 +  * Lot of noise, easy to cross barriers 
 +  * Slowly reduce noise so that system ends up in a deep minimum (Simulated annealing) 
 + 
 +Stochastic binary units 
 + 
 +* Replace binary threshold units with binary stochastic units that make biased random decisions ("temperature" controls noise amount; raising noise is equivalen to decreasing all the energy gaps betweend configurations) 
 + 
 +$p(s_i=1) = \frac{1}{1+e^{-\Delta E_i/T}}$ 
 + 
 +===== Thermal equilibrium ===== 
 +Thermal equi. at temperature of 1 
 + 
 +Reaching thermal equilibrium is difficult concept. Probability distribution over configurations settles down to statonary distribution. 
 + 
 +Intuitively: Huge ensemble of systems that have same energy function. Probabiltiy of configuration is just fraction of the systems that have configuration. 
 + 
 +Approaching equilibrium: 
 + 
 +* Start with any distribution over all identical systems 
 +* Apply stochastic update rule, to pick next configuration for each individual system 
 +* May reach situation where fraction of systems in each configuration remains constant. 
 +  * This stationary distribution is called thermal equilibrium. 
 +  * Any given system keeps changing its configuration, but the fraction of systems in each configuration does not change. 
 + 
 +===== Boltzman machine ===== 
 + 
 +Given: Training set of binary vectors. Fit model that will assign a probability to every possible binary vector. 
 + 
 + 
 +Useful for deciding if other binary vectors come from some distribution (e.g. to detect unusual behavious).
  • data_mining/neural_network/hopfield.1491727395.txt.gz
  • Last modified: 2017/04/09 10:43
  • by phreazer