Differences

This shows you the differences between two versions of the page.

--- data_mining:error_analysis [2018/05/21 19:30] – [False positive data] phreazer
+++ data_mining:error_analysis [2018/05/21 19:38] – [Working on most promising problems] phreazer
@@ Line 114: / Line 114: @@
 Bigger network almost always improves bias and more data improves variance (//not necessarily a tradeoff// between the two).
-===== Working on most promising problems =====
+====== Working on most promising problems ======
 Best case performance if no false positives?
@@ Line 130: / Line 130: @@
 Result: Calc percentage of problem category (potential improvement "ceiling")
+====== Misslabeled data ======
+DL algos: If % or errors is //low// and errors are //random//, they are robust
+Add another col "incorrectly labeled" in error analysis spread sheet.
+Principles when fixing labels:
+- Apply same process to dev and test set (same distribution)
+- Also see what examples algo got right (not only wrong)
+- Train and dev/test data may come from different distribution (no problem if slightly different)