data_mining:neural_network:partitioning

Data set partitioning

Train / Dev / Test set

Traditionally: 60 / 20 / 20

Big Data (>1 mio cases): 95.5 / 0.4 / 0.1

Another problem: Make sure that dev and test set came from same distribution (e.g. user uploaded potato pictures vs. high res).