Table of Contents

Sequence learning

Simple model without memory:

Introduce hidden state * Best we can do is infer probability distribution over space of hidden state vectors.

Linear dynamical system

Hidden state:

Optional: Driving inputs which directly hidden state

 time ->
   o   o   o
   |   |   |
 > h > h > h
   |   |   |
   di  di  di

To predict next output, we need to infer hidden state.

Linear transformed gaussian ⇒ gaussian. Distribution over hidden state given data so far is gaussian. Can be computed using “Kalman filtering”.

Estimation of gaussian distribution possible.

Hidden Markov Models

http://wiki.movedesign.de/doku.php?id=data_mining:hmm

Limitation: At each time step, it selects one of it's hidden states. N hidden states → it can only remeber log(N) bits about what it generated so far.

Consider the information that the first half of an utterance contians about second half:

E.g. if 100 bits are necessary, 2^100 hidden states would be necessary.

Recurrent neural networks

Example applications:

Derived models:

Problems:

Long Short-Term Memory (LSTM) model

One solution for mentioned problems.

4 Elements:

Gates are logistic functions (nice derivatives).

Language modelling

Perplexity measure: Target: Low perplexity, high confidence. Data sets: Penn Treebank

Word embedding: n-dimensional vector of real numbers (n > 100). Word used in similar contexts end up with similar positions in vector space. Can be visualized with t-SNE (dim red).

Sources