Differences

This shows you the differences between two versions of the page.

--- data_mining:error_analysis [2017/08/19 18:49] – [Regularisierung] phreazer
+++ data_mining:error_analysis [2018/05/21 22:24] (current) – [Problems with different train and dev/test set dist] phreazer
@@ Line 52: / Line 52: @@
 ====== Bias / Variance ======
-  * High Bias (underfit): High train and validation error (similar level)
+  * High Bias (underfit): High train and validation error (similar level, e.g. error of train: 15% | val: 16%)
-  * High Variance (overfit): Low train, high validation error
+  * High Variance (overfit): Low train, high validation error (e.g. error of train: 1% | val: 11%)
-  * High Bias and High Variance: High train error, significant higher validation error
+  * High Bias and High Variance: High train error, significant higher validation error (e.g. error of train: 15% | val: 30%)
 Plot: Error / Degree of Polynom (with Training and cross validation error)
-====== Basic recipe ======
@@ Line 89: / Line 87: @@
-==== Aktionen ====
+===== Basic recipe for ML =====
-  * Mehr Trainingsbeispiele: Bei High Variance
-  * Kleinere Mengen von Features: High Variance
+  - High Bias:
-  * Zusätzliche Features hinzufügen: High Bias
+    * Additional features
-  * Zusätzliche polynomielle Features: High Bias
+    * Additional polynomial features
-  * Lambda senken: High Bias
+    * Decrease Lambda (regularization parameter)
-  * Lambda erhöhen: High Variance
+  - High Variance:
+    * More data
+    * Smaller number of features
+    * Increase Lambda (regularization parameter)
+===== Basic recipe for training NNs =====
+Recommended **order**:
+  - High **bias** (look at train set performance):
+    * Bigger network (more hidden layers / units)
+    * Train longer
+    * Advanced optimization algorithms
+    * Better NN //architecture//
+  - High **variance** (look at dev set performance)
+    * More data (won't help for high bias problems)
+    * Regularization
+    * Better NN //architecture//
+Bigger network almost always improves bias and more data improves variance (//not necessarily a tradeoff// between the two).
+====== Working on most promising problems ======
+Best case performance if no false positives?
+E.g. 100 mislabeled dev set examples, how many are dog images (when training a cat classifier). When 50% could be worth to work on problem (if error is currently at 10% => 5%).
+Evaluate multiple ideas in parallel
+- Fix false positives
+- Fix false negatives
+- Improve performance on blurry images
+Create spread sheet: Image / Problem
+Result: Calc percentage of problem category (potential improvement "ceiling")
+General rule: Build your first system quickly, then iterate (dev/test setup, build system, bias/variance & error analyis)
+====== Misslabeled data ======
+DL algos: If % or errors is //low// and errors are //random//, they are robust
+Add another col "incorrectly labeled" in error analysis spread sheet.
+Principles when fixing labels:
+  * Apply same process to dev and test set (same distribution)
+  * Also see what examples algo got right (not only wrong)
+  * Train and dev/test data may come from different distribution (no problem if slightly different)
+====== Missmatched train and dev/test set ======
+  * 200.000 high qual pics
+  * 10.000 low qual blurry pics
+  * Option 1: Combine images, random shuffle in train/dev/test set
+    * Advantage: Same distribution
+    * Disadvantage: Lot of images come from high qual pics (most time is spend on optimizing for high qual pics)
+  * Option 2:
+    * Train set: 205.000 with high and low qual; Dev & Test: 2500 low quality
+    * Advantage: Optimizing right data
+    * Disadvantage: Train distr. is different than dev and test set
+====== Problems with different train and dev/test set dist ======
+Not always good idea to use different dist in train and dev
+  * Human error ~ 0
+  * Train 1%
+  * Dev 10%
+Training-dev set: same distribution as training set, but not used for training
+  * Train 1%
+  * Train-dev: 9%
+  * Dev: 10%
+Still high gap between train and train-dev => variance problem
+If Train and Train-dev would be closer => data-mismatch problem.
+Summary:
+  * Human level 4%
+    * Avoidable bias
+  * Train 7%
+    * Variance
+  * Train-dev: 10%
+    * Data mismatch
+  * Dev: 12%
+    * Degree of overfitting to dev set (if to high => bigger dev set)
+  * Test: 12%
+====== Data mismatch problems ======
+  * Error analysis to understand difference between training and dev/test set
+  * Make training more similar / collect more data similar to dev/test set (e.g. simulate audio environment)
+    * Artificial data synthesis
+      * Problems: Possible that sampling from too few data (for human it might appear ok)