Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:error_analysis [2017/08/19 16:49] – [Regularisierung] phreazer | data_mining:error_analysis [2018/05/21 20:24] (current) – [Problems with different train and dev/test set dist] phreazer | ||
---|---|---|---|
Line 52: | Line 52: | ||
====== Bias / Variance ====== | ====== Bias / Variance ====== | ||
- | * High Bias (underfit): High train and validation error (similar level) | + | * High Bias (underfit): High train and validation error (similar level, e.g. error of train: 15% | val: 16%) |
- | * High Variance (overfit): Low train, high validation error | + | * High Variance (overfit): Low train, high validation error (e.g. error of train: 1% | val: 11%) |
- | * High Bias and High Variance: High train error, significant higher validation error | + | * High Bias and High Variance: High train error, significant higher validation error (e.g. error of train: 15% | val: 30%) |
Plot: Error / Degree of Polynom (with Training and cross validation error) | Plot: Error / Degree of Polynom (with Training and cross validation error) | ||
- | |||
- | ====== Basic recipe ====== | ||
Line 89: | Line 87: | ||
- | ==== Aktionen | + | ===== Basic recipe for ML ===== |
- | | + | |
- | * Kleinere Mengen von Features: High Variance | + | |
- | * Zusätzliche Features hinzufügen: High Bias | + | * Additional features |
- | * Zusätzliche polynomielle Features: High Bias | + | * Additional polynomial features |
- | * Lambda senken: High Bias | + | * Decrease Lambda (regularization parameter) |
- | * Lambda erhöhen: High Variance | + | - High Variance: |
+ | * More data | ||
+ | * Smaller number of features | ||
+ | * Increase Lambda (regularization parameter) | ||
+ | |||
+ | ===== Basic recipe for training NNs ===== | ||
+ | |||
+ | Recommended **order**: | ||
+ | |||
+ | | ||
+ | * Bigger network (more hidden layers / units) | ||
+ | * Train longer | ||
+ | * Advanced optimization algorithms | ||
+ | * Better NN // | ||
+ | - High **variance** (look at dev set performance) | ||
+ | * More data (won't help for high bias problems) | ||
+ | * Regularization | ||
+ | * Better NN // | ||
+ | |||
+ | Bigger network almost always improves bias and more data improves variance (//not necessarily a tradeoff// between the two). | ||
+ | |||
+ | ====== Working on most promising problems ====== | ||
+ | |||
+ | Best case performance if no false positives? | ||
+ | |||
+ | E.g. 100 mislabeled dev set examples, how many are dog images (when training a cat classifier). When 50% could be worth to work on problem (if error is currently at 10% => 5%). | ||
+ | |||
+ | Evaluate multiple ideas in parallel | ||
+ | - Fix false positives | ||
+ | - Fix false negatives | ||
+ | - Improve performance on blurry images | ||
+ | |||
+ | |||
+ | Create spread sheet: Image / Problem | ||
+ | |||
+ | Result: Calc percentage of problem category (potential improvement " | ||
+ | |||
+ | General rule: Build your first system quickly, then iterate (dev/test setup, build system, bias/ | ||
+ | ====== Misslabeled data ====== | ||
+ | |||
+ | DL algos: If % or errors is //low// and errors are //random//, they are robust | ||
+ | |||
+ | Add another col " | ||
+ | |||
+ | Principles when fixing labels: | ||
+ | |||
+ | * Apply same process to dev and test set (same distribution) | ||
+ | * Also see what examples algo got right (not only wrong) | ||
+ | * Train and dev/test data may come from different distribution (no problem if slightly different) | ||
+ | |||
+ | ====== Missmatched train and dev/test set ====== | ||
+ | |||
+ | * 200.000 high qual pics | ||
+ | * 10.000 low qual blurry pics | ||
+ | |||
+ | * Option 1: Combine images, random shuffle in train/ | ||
+ | * Advantage: Same distribution | ||
+ | * Disadvantage: | ||
+ | * Option 2: | ||
+ | * Train set: 205.000 with high and low qual; Dev & Test: 2500 low quality | ||
+ | * Advantage: Optimizing right data | ||
+ | * Disadvantage: | ||
+ | |||
+ | ====== Problems with different train and dev/test set dist ====== | ||
+ | |||
+ | Not always good idea to use different dist in train and dev | ||
+ | |||
+ | * Human error ~ 0 | ||
+ | * Train 1% | ||
+ | * Dev 10% | ||
+ | |||
+ | Training-dev set: same distribution as training set, but not used for training | ||
+ | |||
+ | * Train 1% | ||
+ | * Train-dev: 9% | ||
+ | * Dev: 10% | ||
+ | |||
+ | Still high gap between train and train-dev => variance problem | ||
+ | |||
+ | If Train and Train-dev would be closer => data-mismatch problem. | ||
+ | |||
+ | Summary: | ||
+ | * Human level 4% | ||
+ | * Avoidable bias | ||
+ | * Train 7% | ||
+ | * Variance | ||
+ | * Train-dev: 10% | ||
+ | * Data mismatch | ||
+ | * Dev: 12% | ||
+ | * Degree of overfitting to dev set (if to high => bigger dev set) | ||
+ | * Test: 12% | ||
+ | |||
+ | ====== Data mismatch problems ====== | ||
+ | |||
+ | * Error analysis to understand difference between training and dev/test set | ||
+ | * Make training more similar / collect more data similar to dev/test set (e.g. simulate audio environment) | ||
+ | * Artificial data synthesis | ||
+ | * Problems: Possible that sampling from too few data (for human it might appear ok) |