data_mining:error_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:error_analysis [2017/08/19 18:49] – [Regularisierung] phreazerdata_mining:error_analysis [2018/05/21 22:24] (current) – [Problems with different train and dev/test set dist] phreazer
Line 52: Line 52:
 ====== Bias / Variance ====== ====== Bias / Variance ======
  
-  * High Bias (underfit): High train and validation error (similar level) +  * High Bias (underfit): High train and validation error (similar level, e.g. error of train: 15% | val: 16%
-  * High Variance (overfit): Low train, high validation error +  * High Variance (overfit): Low train, high validation error (e.g. error of train: 1% | val: 11%) 
-  * High Bias and High Variance: High train error, significant higher validation error+  * High Bias and High Variance: High train error, significant higher validation error (e.g. error of train: 15% | val: 30%)
  
 Plot: Error / Degree of Polynom (with Training and cross validation error) Plot: Error / Degree of Polynom (with Training and cross validation error)
- 
-====== Basic recipe ====== 
  
  
Line 89: Line 87:
  
  
-==== Aktionen ==== +===== Basic recipe for ML ===== 
-  * Mehr TrainingsbeispieleBei High Variance + 
-  * Kleinere Mengen von Features: High Variance +  - High Bias: 
-  * Zusätzliche Features hinzufügenHigh Bias +    * Additional features 
-  * Zusätzliche polynomielle FeaturesHigh Bias +    * Additional polynomial features 
-  * Lambda senkenHigh Bias +    * Decrease Lambda (regularization parameter) 
-  * Lambda erhöhenHigh Variance+  - High Variance
 +    * More data 
 +    * Smaller number of features 
 +    * Increase Lambda (regularization parameter) 
 + 
 +===== Basic recipe for training NNs ===== 
 + 
 +Recommended **order**: 
 + 
 +  - High **bias*(look at train set performance): 
 +    * Bigger network (more hidden layers / units) 
 +    * Train longer 
 +    * Advanced optimization algorithms 
 +    * Better NN //architecture// 
 +  - High **variance** (look at dev set performance) 
 +    * More data (won't help for high bias problems) 
 +    * Regularization 
 +    * Better NN //architecture// 
 + 
 +Bigger network almost always improves bias and more data improves variance (//not necessarily a tradeoff// between the two). 
 + 
 +====== Working on most promising problems ====== 
 + 
 +Best case performance if no false positives? 
 + 
 +E.g. 100 mislabeled dev set examples, how many are dog images (when training a cat classifier). When 50% could be worth to work on problem (if error is currently at 10% => 5%). 
 + 
 +Evaluate multiple ideas in parallel 
 +- Fix false positives 
 +- Fix false negatives 
 +- Improve performance on blurry images 
 + 
 + 
 +Create spread sheet: Image / Problem 
 + 
 +Result: Calc percentage of problem category (potential improvement "ceiling"
 + 
 +General rule: Build your first system quickly, then iterate (dev/test setup, build system, bias/variance & error analyis) 
 +====== Misslabeled data ====== 
 + 
 +DL algos: If % or errors is //low// and errors are //random//, they are robust 
 + 
 +Add another col "incorrectly labeled" in error analysis spread sheet. 
 + 
 +Principles when fixing labels: 
 + 
 +  * Apply same process to dev and test set (same distribution) 
 +  * Also see what examples algo got right (not only wrong) 
 +  * Train and dev/test data may come from different distribution (no problem if slightly different) 
 + 
 +====== Missmatched train and dev/test set ====== 
 + 
 +  * 200.000 high qual pics 
 +  * 10.000 low qual blurry pics 
 + 
 +  * Option 1Combine images, random shuffle in train/dev/test set 
 +    * Advantage: Same distribution 
 +    * Disadvantage: Lot of images come from high qual pics (most time is spend on optimizing for high qual pics) 
 +  * Option 2: 
 +    * Train set: 205.000 with high and low qual; Dev & Test: 2500 low quality 
 +    * Advantage: Optimizing right data 
 +    * Disadvantage: Train distr. is different than dev and test set 
 + 
 +====== Problems with different train and dev/test set dist ====== 
 + 
 +Not always good idea to use different dist in train and dev 
 + 
 +  * Human error ~ 0 
 +  * Train 1% 
 +  * Dev 10% 
 + 
 +Training-dev setsame distribution as training set, but not used for training 
 + 
 +  * Train 1% 
 +  * Train-dev9% 
 +  * Dev: 10% 
 + 
 +Still high gap between train and train-dev => variance problem 
 + 
 +If Train and Train-dev would be closer => data-mismatch problem. 
 + 
 +Summary: 
 +  * Human level 4% 
 +    * Avoidable bias 
 +  * Train 7% 
 +    * Variance 
 +  * Train-dev: 10% 
 +    * Data mismatch 
 +  * Dev: 12% 
 +    * Degree of overfitting to dev set (if to high => bigger dev set) 
 +  * Test: 12% 
 + 
 +====== Data mismatch problems ====== 
 + 
 +  * Error analysis to understand difference between training and dev/test set 
 +  * Make training more similar / collect more data similar to dev/test set (e.g. simulate audio environment) 
 +    * Artificial data synthesis 
 +      * Problems: Possible that sampling from too few data (for human it might appear ok)
  • data_mining/error_analysis.1503161374.txt.gz
  • Last modified: 2017/08/19 18:49
  • by phreazer