Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:strategy [2018/05/21 14:56] – phreazer | data_mining:strategy [2018/05/21 16:50] (current) – [Human level performance] phreazer | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Using a single metric evaluation metric | + | ====== Evaluation metrics and train/ |
+ | ===== Using a single metric evaluation metric ===== | ||
Precision (% of examples recognized as class 1, were class 1) | Precision (% of examples recognized as class 1, were class 1) | ||
Line 12: | Line 13: | ||
Use **Dev set** + **single number evaluation** metric to speed-up iterative improvement | Use **Dev set** + **single number evaluation** metric to speed-up iterative improvement | ||
- | ====== Metric tradeoffs | + | ===== Metric tradeoffs ===== |
Maximize accuracy, subject to runningTime <= 100ms | Maximize accuracy, subject to runningTime <= 100ms | ||
N metrics: 1 optimizing, N-1 satisficing (reaching some threshold) | N metrics: 1 optimizing, N-1 satisficing (reaching some threshold) | ||
+ | |||
+ | ===== Train/ | ||
+ | |||
+ | Dev set / holdout set: Try ideas on dev set | ||
+ | |||
+ | Goal: Train and esp. dev and test set should come from **same distribution** | ||
+ | |||
+ | Solution: Random shuffle (or stratified sample) | ||
+ | |||
+ | ==== Sizes ==== | ||
+ | * For 100 - 10.000 samples: 70 Train 30 Test, or 60% Train 20% Dev 20 % Test | ||
+ | * For 1.000.000 (NNs): 98% Train, 1% Dev, 1% Test | ||
+ | |||
+ | ===== Change dev/test set and metric ===== | ||
+ | |||
+ | Change metric, if rank ordering isn't " | ||
+ | |||
+ | One solution: Use weights for certain errors | ||
+ | |||
+ | Two steps: | ||
+ | |||
+ | - Place the target (eval metric) | ||
+ | - How to shoot at target (how to optimize metric) | ||
+ | |||
+ | E.g. high quality images in dev/test set, user upload low quality images. => change metric and/or dev/test set | ||
+ | |||
+ | ====== Human level performance ====== | ||
+ | |||
+ | Bayes optimal error (best optimal error) | ||
+ | |||
+ | Human level error could be used as an estimate for Bayes error (e.g. in Computer Vision) | ||
+ | |||
+ | * H: 1%, Train: 8%, Dev: 10% => bias reduction | ||
+ | * H: 7,5%, Train: 8, Dev: 10% => variance reduction (more data, regularization) | ||
+ | |||
+ | What's human-level error? Best performance possible as a human / usefullness | ||
+ | |||
+ | Measure of error between Human Error, Train Error and Dev error | ||
+ | |||
+ | * Avoidable bias: Human level <> Training Error | ||
+ | * Train bigger model | ||
+ | * Train longer/ | ||
+ | * NN architecture/ | ||
+ | * Variance: Training Error <> Dev Error | ||
+ | * More data | ||
+ | * Regularization | ||
+ | * NN architecture/ | ||
+ |