Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:neural_network:perceptron [2017/02/03 22:06] – [What perceptrons can't do] phreazer | data_mining:neural_network:perceptron [2017/02/03 22:21] (current) – [What perceptrons can't do] phreazer | ||
---|---|---|---|
Line 48: | Line 48: | ||
==== What perceptrons can't do ==== | ==== What perceptrons can't do ==== | ||
- | Separate feature unit for each of the many binary vectors => any possible discrimination. | + | Separate feature unit for each of the many binary vectors => any possible discrimination. |
This type of table look-up won't generalize. (guess due to the binary encoding). | This type of table look-up won't generalize. (guess due to the binary encoding). | ||
Line 59: | Line 60: | ||
- Weight plane is perpendicular to the weight vector and misses the origin by a distance equal to the threshold. | - Weight plane is perpendicular to the weight vector and misses the origin by a distance equal to the threshold. | ||
- | Pos and neg cases can not be seperated by a plane. | + | Pos and neg cases can not be seperated by a plane => not linearly separable. |
+ | === Discriminate patterns under translation with wrap-around === | ||
+ | |||
+ | Binary threshold neuron can't discriminate between different patterns that have same number of on pixels (if pattern can translate with wrap-around). | ||
+ | |||
+ | Proof: | ||
+ | |||
+ | Pattern A on all possible translations (4 on pixels). | ||
+ | Total Input will be 4x sum of all the weights. | ||
+ | |||
+ | Pattern B on all possible translations (4 on pixels) | ||
+ | Total Input will be 4x sum of all the weights. | ||
+ | |||
+ | But every single case of pattern A must provide more import than all single cases of pattern B. | ||
Case: | Case: | ||
Line 72: | Line 86: | ||
Any example with 3 pixels on: activation = -0.5 | Any example with 3 pixels on: activation = -0.5 | ||
+ | |||
Any example with 1 pixel on: activation = -2.5 | Any example with 1 pixel on: activation = -2.5 | ||
+ | |||
Any example with 4 pixel on: acitvation = 0.5 | Any example with 4 pixel on: acitvation = 0.5 | ||
Line 82: | Line 98: | ||
Conclusion (after 20 yrs): Neuronal Network has to learn the feature detectors (not only weights). | Conclusion (after 20 yrs): Neuronal Network has to learn the feature detectors (not only weights). | ||
- | - More layers of linear units do not help (still linear). | + | * More layers of linear units do not help (still linear). |
- | - Fixed output non-linearities are not enough. | + | |
+ | |||
+ | => Need multiple layers of **adaptive**, | ||
- | => Need multiple layers of adaptive, non-linear hidden units. | + | * Efficient way of adapting all the weights. |
- | - Efficient way of adapting all the weights. | + | |
- | - Learning weights going into hidden units is equivalent to learning features. | + | |
- | - No one is telling us what the feature vectors should be. | + | |