data_mining:error_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
data_mining:error_analysis [2018/05/21 19:46] – [Misslabeled data] phreazerdata_mining:error_analysis [2018/05/21 19:55] – [Misslabeled data] phreazer
Line 142: Line 142:
   * Also see what examples algo got right (not only wrong)   * Also see what examples algo got right (not only wrong)
   * Train and dev/test data may come from different distribution (no problem if slightly different)   * Train and dev/test data may come from different distribution (no problem if slightly different)
 +
 +====== Missmatched train and dev/test set ======
 +
 +  * 200.000 high qual pics
 +  * 10.000 low qual blurry pics
 +
 +  * Option 1: Combine images, random shuffle in train/dev/test set
 +    * Advantage: Same distribution
 +    * Disadvantage: Lot of images come from high qual pics (most time is spend on optimizing for high qual pics)
 +  * Option 2:
 +    * Train set: 205.000 with high and low qual; Dev & Test: 2500 low quality
 +    * Advantage: Optimizing right data
 +    * Disadvantage: Train distr. is different than dev and test set 
  • data_mining/error_analysis.txt
  • Last modified: 2018/05/21 22:24
  • by phreazer