Table of Contents

Sequence to sequence models

Image captioning:

Picking most likely sentence

Language Model: $P(y^1, ..., y^{Ty})$

Machine translation: Has encoder network as input

“Conditional language model” $P(y^1, ..., y^{Ty}| x^1, ..., x^{Tx})$

P(english | french)

arg max ${y^1, ..., y^{Ty}}$ $P(y^1, ..., y^{Ty}| x^1, ..., x^{Tx})$

Solution: Beam Search

Why not greedy?

Jane is visiting … Jane is going …

P(Jane is going|x) > P(Jane is visiting | x)

Going more probable as follow up word, but sentence isn't better

Most likely output to be searched.

Step 1:

Vocabulary of 10000 words

$P(y^1 | x)$

Encoder ⇒ Output $\hat{y}^1$

Parameter Beam width B = 3

Keep track of B most likely words

Step 2:

For B most likely choices:

Use word c1, wire $\hat{y}^1$ set to likely word with $\hat{y}^2$ to get $P(y^2 | x, in)$

$P(y^1, y^2 | x) = P(y^1 | x) * P ( y^2 | x, y^1)$

Evaluate all 10000 options for each likely word

Remember only most likely 3 choices for first and second word in each step

$B=1$ ⇒ Greedy

Beam search refinement

Length normalization:

Long sentences are more unlikely, because of more multiplications

How to choose B?

Beam search is not guaranteed to find exact maximum, unlike BFS, DFS

Attribute error to RNN or Beam search?

RNN computes P(y|x)

Compute P(good translation|x)

Compute P(algo translation|x)

Bleu score

Multiple similar good translations ⇒ Compute score how good translation is.

Attention model

Work part by part through sentence, instead of remember complete long sentences.