data_mining:neural_network:perceptron

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:perceptron [2017/02/03 22:59] – [What perceptrons can't do] phreazerdata_mining:neural_network:perceptron [2017/02/03 23:21] (current) – [What perceptrons can't do] phreazer
Line 48: Line 48:
 ==== What perceptrons can't do ==== ==== What perceptrons can't do ====
  
-Separate feature unit for each of the many binary vectors => any possible discrimination. +Separate feature unit for each of the many binary vectors => any possible discrimination. Exponential many features are necessary.  
 This type of table look-up won't generalize. (guess due to the binary encoding). This type of table look-up won't generalize. (guess due to the binary encoding).
  
Line 59: Line 60:
 - Weight plane is perpendicular to the weight vector and misses the origin by a distance equal to the threshold. - Weight plane is perpendicular to the weight vector and misses the origin by a distance equal to the threshold.
  
-Pos and neg cases can not be seperated by a plane.+Pos and neg cases can not be seperated by a plane => not linearly separable.
  
 +=== Discriminate patterns under translation with wrap-around ===
 +
 +Binary threshold neuron can't discriminate between different patterns that have same number of on pixels (if pattern can translate with wrap-around).
 +
 +Proof:
 +
 +Pattern A on all possible translations (4 on pixels). 
 +Total Input will be 4x sum of all the weights.
 +
 +Pattern B on all possible translations (4 on pixels)
 +Total Input will be 4x sum of all the weights.
 +
 +But every single case of pattern A must provide more import than all single cases of pattern B.
  
 Case: Case:
Line 72: Line 86:
  
 Any example with 3 pixels on: activation = -0.5 Any example with 3 pixels on: activation = -0.5
 +
 Any example with 1 pixel on: activation = -2.5 Any example with 1 pixel on: activation = -2.5
 +
 Any example with 4 pixel on: acitvation = 0.5 Any example with 4 pixel on: acitvation = 0.5
  
 => correctly classified. => correctly classified.
 +
 +Whole point of pattern recognition is to recognize patterns despite transformations like translation. Minksy and Paper ("Group Invariance Theorem"): Part of perceptron that **learns** cannot learn to do this if the transformation form a **group**:
 +- Translation with **wrap-around** form a group.
 +To deal with such transf. tricky part of recognition must be solved by hand-coded **feature detectors** (not learning procedure).
 +
 +Conclusion (after 20 yrs): Neuronal Network has to learn the feature detectors (not only weights).
 +  * More layers of linear units do not help (still linear).
 +  * Fixed output non-linearities are not enough.
 +
 +=> Need multiple layers of **adaptive**, **non-linear** hidden units.
 +
 +  * Efficient way of adapting all the weights.
 +  * Learning weights going into hidden units is equivalent to learning features.
 +  * No one is telling us what the feature vectors should be.
 +
  • data_mining/neural_network/perceptron.1486159181.txt.gz
  • Last modified: 2017/02/03 22:59
  • by phreazer