data_mining:error_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:error_analysis [2018/05/21 19:30] – [False positive data] phreazerdata_mining:error_analysis [2018/05/21 22:24] (current) – [Problems with different train and dev/test set dist] phreazer
Line 114: Line 114:
 Bigger network almost always improves bias and more data improves variance (//not necessarily a tradeoff// between the two). Bigger network almost always improves bias and more data improves variance (//not necessarily a tradeoff// between the two).
  
-===== Working on most promising problems =====+====== Working on most promising problems ======
  
 Best case performance if no false positives? Best case performance if no false positives?
Line 130: Line 130:
 Result: Calc percentage of problem category (potential improvement "ceiling") Result: Calc percentage of problem category (potential improvement "ceiling")
  
 +General rule: Build your first system quickly, then iterate (dev/test setup, build system, bias/variance & error analyis)
 +====== Misslabeled data ======
 +
 +DL algos: If % or errors is //low// and errors are //random//, they are robust
 +
 +Add another col "incorrectly labeled" in error analysis spread sheet.
 +
 +Principles when fixing labels:
 +
 +  * Apply same process to dev and test set (same distribution)
 +  * Also see what examples algo got right (not only wrong)
 +  * Train and dev/test data may come from different distribution (no problem if slightly different)
 +
 +====== Missmatched train and dev/test set ======
 +
 +  * 200.000 high qual pics
 +  * 10.000 low qual blurry pics
 +
 +  * Option 1: Combine images, random shuffle in train/dev/test set
 +    * Advantage: Same distribution
 +    * Disadvantage: Lot of images come from high qual pics (most time is spend on optimizing for high qual pics)
 +  * Option 2:
 +    * Train set: 205.000 with high and low qual; Dev & Test: 2500 low quality
 +    * Advantage: Optimizing right data
 +    * Disadvantage: Train distr. is different than dev and test set
 +
 +====== Problems with different train and dev/test set dist ======
 +
 +Not always good idea to use different dist in train and dev
 +
 +  * Human error ~ 0
 +  * Train 1%
 +  * Dev 10%
 +
 +Training-dev set: same distribution as training set, but not used for training
 +
 +  * Train 1%
 +  * Train-dev: 9%
 +  * Dev: 10%
 +
 +Still high gap between train and train-dev => variance problem
 +
 +If Train and Train-dev would be closer => data-mismatch problem.
 +
 +Summary:
 +  * Human level 4%
 +    * Avoidable bias
 +  * Train 7%
 +    * Variance
 +  * Train-dev: 10%
 +    * Data mismatch
 +  * Dev: 12%
 +    * Degree of overfitting to dev set (if to high => bigger dev set)
 +  * Test: 12%
 +
 +====== Data mismatch problems ======
 +
 +  * Error analysis to understand difference between training and dev/test set
 +  * Make training more similar / collect more data similar to dev/test set (e.g. simulate audio environment)
 +    * Artificial data synthesis
 +      * Problems: Possible that sampling from too few data (for human it might appear ok)
  • data_mining/error_analysis.1526923808.txt.gz
  • Last modified: 2018/05/21 19:30
  • by phreazer