Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
data_mining:error_analysis [2018/05/21 19:28] – [False positive data] phreazer | data_mining:error_analysis [2018/05/21 19:46] – [Misslabeled data] phreazer | ||
---|---|---|---|
Line 114: | Line 114: | ||
Bigger network almost always improves bias and more data improves variance (//not necessarily a tradeoff// between the two). | Bigger network almost always improves bias and more data improves variance (//not necessarily a tradeoff// between the two). | ||
- | ===== False positive data ===== | + | ====== Working on most promising problems ====== |
Best case performance if no false positives? | Best case performance if no false positives? | ||
Line 130: | Line 130: | ||
Result: Calc percentage of problem category (potential improvement " | Result: Calc percentage of problem category (potential improvement " | ||
+ | General rule: Build your first system quickly, then iterate (dev/test setup, build system, bias/ | ||
+ | ====== Misslabeled data ====== | ||
+ | |||
+ | DL algos: If % or errors is //low// and errors are //random//, they are robust | ||
+ | |||
+ | Add another col " | ||
+ | |||
+ | Principles when fixing labels: | ||
+ | |||
+ | * Apply same process to dev and test set (same distribution) | ||
+ | * Also see what examples algo got right (not only wrong) | ||
+ | * Train and dev/test data may come from different distribution (no problem if slightly different) |