See Perceptron
First layer is input, last layer output, hidden layers inbetween. Deep network: >1 hidden layer
Transformation which change the similarity of the input cases (e.g. different voiced, same words): Activity of neurons in each layer are non-linear function of the activities in the layer below.
Natural for modeling sequential data:
Like RNN, but connections between units are symmetrical (same weigths in both directions).