data_mining:error_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:error_analysis [2018/05/21 19:45] – [Working on most promising problems] phreazerdata_mining:error_analysis [2018/05/21 22:24] (current) – [Problems with different train and dev/test set dist] phreazer
Line 139: Line 139:
 Principles when fixing labels: Principles when fixing labels:
  
-Apply same process to dev and test set (same distribution) +  * Apply same process to dev and test set (same distribution) 
-Also see what examples algo got right (not only wrong) +  Also see what examples algo got right (not only wrong) 
-Train and dev/test data may come from different distribution (no problem if slightly different)+  Train and dev/test data may come from different distribution (no problem if slightly different
 + 
 +====== Missmatched train and dev/test set ====== 
 + 
 +  * 200.000 high qual pics 
 +  * 10.000 low qual blurry pics 
 + 
 +  * Option 1: Combine images, random shuffle in train/dev/test set 
 +    * Advantage: Same distribution 
 +    * Disadvantage: Lot of images come from high qual pics (most time is spend on optimizing for high qual pics) 
 +  * Option 2: 
 +    * Train set: 205.000 with high and low qual; Dev & Test: 2500 low quality 
 +    * Advantage: Optimizing right data 
 +    * Disadvantage: Train distr. is different than dev and test set 
 + 
 +====== Problems with different train and dev/test set dist ====== 
 + 
 +Not always good idea to use different dist in train and dev 
 + 
 +  * Human error ~ 0 
 +  * Train 1% 
 +  * Dev 10% 
 + 
 +Training-dev set: same distribution as training set, but not used for training 
 + 
 +  * Train 1% 
 +  * Train-dev: 9% 
 +  * Dev: 10% 
 + 
 +Still high gap between train and train-dev => variance problem 
 + 
 +If Train and Train-dev would be closer => data-mismatch problem. 
 + 
 +Summary: 
 +  * Human level 4% 
 +    * Avoidable bias 
 +  * Train 7% 
 +    * Variance 
 +  * Train-dev: 10% 
 +    * Data mismatch 
 +  * Dev: 12% 
 +    * Degree of overfitting to dev set (if to high => bigger dev set) 
 +  * Test: 12% 
 + 
 +====== Data mismatch problems ====== 
 + 
 +  * Error analysis to understand difference between training and dev/test set 
 +  * Make training more similar / collect more data similar to dev/test set (e.g. simulate audio environment) 
 +    * Artificial data synthesis 
 +      * Problems: Possible that sampling from too few data (for human it might appear ok)
  • data_mining/error_analysis.1526924741.txt.gz
  • Last modified: 2018/05/21 19:45
  • by phreazer