Differences

This shows you the differences between two versions of the page.

--- data_mining:neural_network:perceptron [2017/02/03 22:59] – [What perceptrons can't do] phreazer
+++ data_mining:neural_network:perceptron [2017/02/03 23:21] (current) – [What perceptrons can't do] phreazer
@@ Line 48: / Line 48: @@
 ==== What perceptrons can't do ====
-Separate feature unit for each of the many binary vectors => any possible discrimination.
+Separate feature unit for each of the many binary vectors => any possible discrimination. Exponential many features are necessary.
 This type of table look-up won't generalize. (guess due to the binary encoding).
@@ Line 59: / Line 60: @@
 - Weight plane is perpendicular to the weight vector and misses the origin by a distance equal to the threshold.
-Pos and neg cases can not be seperated by a plane.
+Pos and neg cases can not be seperated by a plane => not linearly separable.
+=== Discriminate patterns under translation with wrap-around ===
+Binary threshold neuron can't discriminate between different patterns that have same number of on pixels (if pattern can translate with wrap-around).
+Proof:
+Pattern A on all possible translations (4 on pixels).
+Total Input will be 4x sum of all the weights.
+Pattern B on all possible translations (4 on pixels)
+Total Input will be 4x sum of all the weights.
+But every single case of pattern A must provide more import than all single cases of pattern B.
 Case:
@@ Line 72: / Line 86: @@
 Any example with 3 pixels on: activation = -0.5
 Any example with 1 pixel on: activation = -2.5
 Any example with 4 pixel on: acitvation = 0.5
 => correctly classified.
+Whole point of pattern recognition is to recognize patterns despite transformations like translation. Minksy and Paper ("Group Invariance Theorem"): Part of perceptron that **learns** cannot learn to do this if the transformation form a **group**:
+- Translation with **wrap-around** form a group.
+To deal with such transf. tricky part of recognition must be solved by hand-coded **feature detectors** (not learning procedure).
+Conclusion (after 20 yrs): Neuronal Network has to learn the feature detectors (not only weights).
+  * More layers of linear units do not help (still linear).
+  * Fixed output non-linearities are not enough.
+=> Need multiple layers of **adaptive**, **non-linear** hidden units.
+  * Efficient way of adapting all the weights.
+  * Learning weights going into hidden units is equivalent to learning features.
+  * No one is telling us what the feature vectors should be.