Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:error_analysis [2018/05/21 17:55] – [Misslabeled data] phreazer | data_mining:error_analysis [2018/05/21 20:24] (current) – [Problems with different train and dev/test set dist] phreazer | ||
---|---|---|---|
Line 154: | Line 154: | ||
* Train set: 205.000 with high and low qual; Dev & Test: 2500 low quality | * Train set: 205.000 with high and low qual; Dev & Test: 2500 low quality | ||
* Advantage: Optimizing right data | * Advantage: Optimizing right data | ||
- | * Disadvantage: | + | * Disadvantage: |
+ | |||
+ | ====== Problems with different train and dev/test set dist ====== | ||
+ | |||
+ | Not always good idea to use different dist in train and dev | ||
+ | |||
+ | * Human error ~ 0 | ||
+ | * Train 1% | ||
+ | * Dev 10% | ||
+ | |||
+ | Training-dev set: same distribution as training set, but not used for training | ||
+ | |||
+ | * Train 1% | ||
+ | * Train-dev: 9% | ||
+ | * Dev: 10% | ||
+ | |||
+ | Still high gap between train and train-dev => variance problem | ||
+ | |||
+ | If Train and Train-dev would be closer => data-mismatch problem. | ||
+ | |||
+ | Summary: | ||
+ | * Human level 4% | ||
+ | * Avoidable bias | ||
+ | * Train 7% | ||
+ | * Variance | ||
+ | * Train-dev: 10% | ||
+ | * Data mismatch | ||
+ | * Dev: 12% | ||
+ | * Degree of overfitting to dev set (if to high => bigger dev set) | ||
+ | * Test: 12% | ||
+ | |||
+ | ====== Data mismatch problems ====== | ||
+ | |||
+ | * Error analysis to understand difference between training and dev/test set | ||
+ | * Make training more similar / collect more data similar to dev/test set (e.g. simulate audio environment) | ||
+ | * Artificial data synthesis | ||
+ | * Problems: Possible that sampling from too few data (for human it might appear ok) |