Differences

This shows you the differences between two versions of the page.

--- data_mining:strategy [2018/05/21 16:59] – [Metric tradeoffs] phreazer
+++ data_mining:strategy [2018/05/21 18:50] (current) – [Human level performance] phreazer
@@ Line 1: / Line 1: @@
-====== Using a single metric evaluation metric ======
+====== Evaluation metrics and train/dev/test set ======
+===== Using a single metric evaluation metric =====
 Precision (% of examples recognized as class 1, were class 1)
@@ Line 12: / Line 13: @@
 Use **Dev set** + **single number evaluation** metric to speed-up iterative improvement
-====== Metric tradeoffs ======
+===== Metric tradeoffs =====
 Maximize accuracy, subject to runningTime <= 100ms
@@ Line 18: / Line 19: @@
 N metrics: 1 optimizing, N-1 satisficing (reaching some threshold)
-====== Train/Dev/Test set ======
+===== Train/Dev/Test set =====
 Dev set / holdout set: Try ideas on dev set
-Goal: Train and test set should come from **same distribution**
+Goal: Train and esp. dev and test set should come from **same distribution**
+Solution: Random shuffle (or stratified sample)
+==== Sizes ====
+  * For 100 - 10.000 samples: 70 Train 30 Test, or 60% Train 20% Dev 20 % Test
+  * For 1.000.000 (NNs): 98% Train, 1% Dev, 1% Test
+===== Change dev/test set and metric =====
+Change metric, if rank ordering isn't "right"
+One solution: Use weights for certain errors
+Two steps:
+  - Place the target (eval metric)
+  - How to shoot at target (how to optimize metric)
+E.g. high quality images in dev/test set, user upload low quality images. => change metric and/or dev/test set
+====== Human level performance ======
+Bayes optimal error (best optimal error)
+Human level error could be used as an estimate for Bayes error (e.g. in Computer Vision)
+  * H: 1%, Train: 8%, Dev: 10% => bias reduction
+  * H: 7,5%, Train: 8, Dev: 10% => variance reduction (more data, regularization)
+What's human-level error? Best performance possible as a human / usefullness
+Measure of error between Human Error, Train Error and Dev error
+  * Avoidable bias: Human level <> Training Error
+    * Train bigger model
+    * Train longer/better opti algos
+    * NN architecture/hyperparam search
+  * Variance: Training Error <> Dev Error
+    * More data
+    * Regularization
+    * NN architecture/hyperparam search