data_mining:error_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
data_mining:error_analysis [2018/05/10 14:36] – [Bias / Variance] phreazerdata_mining:error_analysis [2018/05/21 22:11] – [Problems with different train and dev/test set dist] phreazer
Line 100: Line 100:
 ===== Basic recipe for training NNs ===== ===== Basic recipe for training NNs =====
  
-Recommended order:+Recommended **order**:
  
-  - High bias? +  - High **bias** (look at train set performance): 
-    * Bigger network+    * Bigger network (more hidden layers / units)
     * Train longer     * Train longer
-    * Better NN architecture +    * Advanced optimization algorithms 
-  - High variance (look at test set)? +    * Better NN //architecture// 
-    * More data+  - High **variance** (look at dev set performance
 +    * More data (won't help for high bias problems)
     * Regularization     * Regularization
-    * Better NN architecture+    * Better NN //architecture//
  
-When bigger network almost always improves bias and more data improves variance (not necessarily a tradeoff between the two).+Bigger network almost always improves bias and more data improves variance (//not necessarily a tradeoff// between the two).
  
 +====== Working on most promising problems ======
 +
 +Best case performance if no false positives?
 +
 +E.g. 100 mislabeled dev set examples, how many are dog images (when training a cat classifier). When 50% could be worth to work on problem (if error is currently at 10% => 5%).
 +
 +Evaluate multiple ideas in parallel
 +- Fix false positives
 +- Fix false negatives
 +- Improve performance on blurry images
 +
 +
 +Create spread sheet: Image / Problem
 +
 +Result: Calc percentage of problem category (potential improvement "ceiling")
 +
 +General rule: Build your first system quickly, then iterate (dev/test setup, build system, bias/variance & error analyis)
 +====== Misslabeled data ======
 +
 +DL algos: If % or errors is //low// and errors are //random//, they are robust
 +
 +Add another col "incorrectly labeled" in error analysis spread sheet.
 +
 +Principles when fixing labels:
 +
 +  * Apply same process to dev and test set (same distribution)
 +  * Also see what examples algo got right (not only wrong)
 +  * Train and dev/test data may come from different distribution (no problem if slightly different)
 +
 +====== Missmatched train and dev/test set ======
 +
 +  * 200.000 high qual pics
 +  * 10.000 low qual blurry pics
 +
 +  * Option 1: Combine images, random shuffle in train/dev/test set
 +    * Advantage: Same distribution
 +    * Disadvantage: Lot of images come from high qual pics (most time is spend on optimizing for high qual pics)
 +  * Option 2:
 +    * Train set: 205.000 with high and low qual; Dev & Test: 2500 low quality
 +    * Advantage: Optimizing right data
 +    * Disadvantage: Train distr. is different than dev and test set
 +
 +====== Problems with different train and dev/test set dist ======
 +
 +Not always good idea to use different dist in train and dev
 +
 +  * Human error ~ 0
 +  * Train 1%
 +  * Dev 10%
 +
 +Training-dev set: same distribution as training set, but not used for training
 +
 +  * Train 1%
 +  * Train-dev: 9%
 +  * Dev: 10%
 +
 +Still high gap between train and train-dev => variance problem
 +
 +If Train and Train-dev would be closer => data-mismatch problem.
 +
 +Summary:
 +  * Human level 4%
 +    * Avoidable bias
 +  * Train 7%
 +    * Variance
 +  * Train-dev: 10%
 +    * Data mismatch
 +  * Dev: 12%
 +    * Degree of overfitting to dev set (if to high => bigger dev set)
 +  * Test: 12%
  • data_mining/error_analysis.txt
  • Last modified: 2018/05/21 22:24
  • by phreazer