data_mining:neural_network:perceptron

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:perceptron [2017/02/02 23:04] – [Weight space] phreazerdata_mining:neural_network:perceptron [2017/02/03 23:21] (current) – [What perceptrons can't do] phreazer
Line 17: Line 17:
 ====== Geometrical Interpretation ====== ====== Geometrical Interpretation ======
  
-===== Weight space =====+===== Weight-Space view =====
  
-  * 1 dimensions for each weight +  * 1 dimension for each weight 
-  * Point represents setting of all weights +  * Point represents setting of all weights 
-  * Leaving the threshold out, each **training case** can be represented as a **hyperplane** through the **origin**.+  * Leaving the threshold out, each **training case** can be represented as a **hyperplane** through the **origin**. Inputs represent planes (or Constraints)
      * For a particular training case: Weights must lie on one side of this hyper-plane to get the answer correct.      * For a particular training case: Weights must lie on one side of this hyper-plane to get the answer correct.
  
-Plane is perpendicular to the input vector. Good weight vector needs to be on the same side of the hyperplane. + 
-Scalar product of wight vector and input vector positiv (angle <90°).+ 
 +Plane goes through **origin**, is perpendicular to the input vector (with correct answer = 1 (or 0)). Good weight vector needs to be on the same side of the hyperplane. 
 +Scalar product of wight vector and input vector positiv (angle < 90°)
 + 
 +**Cone of feasable solutions** 
 + 
 +Need to find a point on right side of all the planes (training cases): Might not exist. 
 +If there are weight vectors that get the right side for all cases, they lie in a hyper-cone, with apex in origin. 
 +Average of two good wight vectors is a good weight vector => Convex problem. 
 + 
 + 
 +==== Why learning works ==== 
 + 
 +Proof that squared distance between feasable and current weight vector get's smaller. Hopeful claim: Every time perceptron makes mistakes, the current weight vector gets closer to all feasble wight vectors. Not true. 
 + 
 +"generously feasable" weight vector, that lie within a feasible region by a **margin** at least as great as the **length of the input vector** that defines **each constraint plane**. 
 + 
 +Every time perceptron makes mistake, squared distance to all of the **generously feasible** weight vectors is always decreased by at least the **squared length** of the **update vector**. 
 + 
 +After finite number of mistakes, the weight vector must lie in the **feasible** region, if the region exists. 
 + 
 +==== What perceptrons can't do ==== 
 + 
 +Separate feature unit for each of the many binary vectors => any possible discrimination. Exponential many features are necessary.  
 + 
 +This type of table look-up won't generalize. (guess due to the binary encoding). 
 + 
 +Binary threshold output unit can't tell if two single bit features are the same. 
 +=> contradicting constraints. 
 + 
 +=== Data-Space === 
 +- Each input vector is point in space 
 +- Weight vector defines plane 
 +- Weight plane is perpendicular to the weight vector and misses the origin by a distance equal to the threshold. 
 + 
 +Pos and neg cases can not be seperated by a plane => not linearly separable. 
 + 
 +=== Discriminate patterns under translation with wrap-around === 
 + 
 +Binary threshold neuron can't discriminate between different patterns that have same number of on pixels (if pattern can translate with wrap-around). 
 + 
 +Proof: 
 + 
 +Pattern A on all possible translations (4 on pixels).  
 +Total Input will be 4x sum of all the weights. 
 + 
 +Pattern B on all possible translations (4 on pixels) 
 +Total Input will be 4x sum of all the weights. 
 + 
 +But every single case of pattern A must provide more import than all single cases of pattern B. 
 + 
 +Case: 
 + 
 +3 Patterns with 2 Classes. 
 +First class contains patterns with 4 pixels on. 
 +Second class contains pattern with either 1 or 3 pixels on. 
 + 
 +Weight from each pixel = 1. 
 +Bias = -3.5 
 + 
 +Any example with 3 pixels on: activation = -0.5 
 + 
 +Any example with 1 pixel on: activation = -2.5 
 + 
 +Any example with 4 pixel on: acitvation = 0.5 
 + 
 +=> correctly classified. 
 + 
 +Whole point of pattern recognition is to recognize patterns despite transformations like translation. Minksy and Paper ("Group Invariance Theorem"): Part of perceptron that **learns** cannot learn to do this if the transformation form a **group**: 
 +- Translation with **wrap-around** form a group. 
 +To deal with such transf. tricky part of recognition must be solved by hand-coded **feature detectors** (not learning procedure). 
 + 
 +Conclusion (after 20 yrs): Neuronal Network has to learn the feature detectors (not only weights). 
 +  * More layers of linear units do not help (still linear). 
 +  * Fixed output non-linearities are not enough. 
 + 
 +=> Need multiple layers of **adaptive**, **non-linear** hidden units. 
 + 
 +  * Efficient way of adapting all the weights. 
 +  * Learning weights going into hidden units is equivalent to learning features. 
 +  * No one is telling us what the feature vectors should be.
  
  • data_mining/neural_network/perceptron.1486073068.txt.gz
  • Last modified: 2017/02/02 23:04
  • by phreazer