Perceptron

This is an old revision of the document!

Popularized by Frank Rosenblatt (1960s)
Used for tasks with very big vectors of features

Decision Unit: Binary trheshold neuron.

Bias can be learned like weights, it's weight with value 1.

Perceptron convergence

If output correct ⇒ no weight changes
If output unit incorrectly outputs 0 ⇒ add input vector to weight vector.
If output unit incorrectly outputs 1 ⇒ substract input vector from the weight vector.

This generates set of weights that gets the right answer for all training cases, if such a set exists. ⇒ Deciding the features is the important distinction

1 dimension for each weight
Point represents a setting of all weights
Leaving the threshold out, each training case can be represented as a hyperplane through the origin. Inputs represent planes (or Constraints)
- For a particular training case: Weights must lie on one side of this hyper-plane to get the answer correct.

Plane goes through origin, is perpendicular to the input vector (with correct answer = 1 (or 0)). Good weight vector needs to be on the same side of the hyperplane. Scalar product of wight vector and input vector positiv (angle < 90°).

Cone of feasable solutions

Need to find a point on right side of all the planes (training cases): Might not exist. If there are weight vectors that get the right side for all cases, they lie in a hyper-cone, with apex in origin. Average of two good wight vectors is a good weight vector ⇒ Convex problem.

Proof that squared distance between feasable and current weight vector get's smaller. Hopeful claim: Every time perceptron makes mistakes, the current weight vector gets closer to all feasble wight vectors. Not true.

“generously feasable” weight vector, that lie within a feasible region by a margin at least as great as the length of the input vector that defines each constraint plane.

Every time perceptron makes mistake, squared distance to all of the generously feasible weight vectors is always decreased by at least the squared length of the update vector.

After finite number of mistakes, the weight vector must lie in the feasible region, if the region exists.

Separate feature unit for each of the many binary vectors ⇒ any possible discrimination. This type of table look-up won't generalize. (guess due to the binary encoding).

Binary threshold output unit can't tell if two single bit features are the same. ⇒ contradicting constraints.

Data-Space

- Each input vector is point in space - Weight vector defines plane - Weight plane is perpendicular to the weight vector and misses the origin by a distance equal to the threshold.

Pos and neg cases can not be seperated by a plane.

Case:

3 Patterns with 2 Classes. First class contains patterns with 4 pixels on. Second class contains pattern with either 1 or 3 pixels on.

Weight from each pixel = 1. Bias = -3.5

Any example with 3 pixels on: activation = -0.5 Any example with 1 pixel on: activation = -2.5 Any example with 4 pixel on: acitvation = 0.5

⇒ correctly classified.