Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Next revisionBoth sides next revision | ||
data_mining:error_analysis [2018/05/21 19:46] – [Misslabeled data] phreazer | data_mining:error_analysis [2018/05/21 19:55] – [Misslabeled data] phreazer | ||
---|---|---|---|
Line 142: | Line 142: | ||
* Also see what examples algo got right (not only wrong) | * Also see what examples algo got right (not only wrong) | ||
* Train and dev/test data may come from different distribution (no problem if slightly different) | * Train and dev/test data may come from different distribution (no problem if slightly different) | ||
+ | |||
+ | ====== Missmatched train and dev/test set ====== | ||
+ | |||
+ | * 200.000 high qual pics | ||
+ | * 10.000 low qual blurry pics | ||
+ | |||
+ | * Option 1: Combine images, random shuffle in train/ | ||
+ | * Advantage: Same distribution | ||
+ | * Disadvantage: | ||
+ | * Option 2: | ||
+ | * Train set: 205.000 with high and low qual; Dev & Test: 2500 low quality | ||
+ | * Advantage: Optimizing right data | ||
+ | * Disadvantage: |