data_mining:error_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
data_mining:error_analysis [2018/05/21 19:30] – [False positive data] phreazerdata_mining:error_analysis [2018/05/21 19:38] – [Working on most promising problems] phreazer
Line 114: Line 114:
 Bigger network almost always improves bias and more data improves variance (//not necessarily a tradeoff// between the two). Bigger network almost always improves bias and more data improves variance (//not necessarily a tradeoff// between the two).
  
-===== Working on most promising problems =====+====== Working on most promising problems ======
  
 Best case performance if no false positives? Best case performance if no false positives?
Line 130: Line 130:
 Result: Calc percentage of problem category (potential improvement "ceiling") Result: Calc percentage of problem category (potential improvement "ceiling")
  
 +====== Misslabeled data ======
 +
 +DL algos: If % or errors is //low// and errors are //random//, they are robust
 +
 +Add another col "incorrectly labeled" in error analysis spread sheet.
 +
 +Principles when fixing labels:
 +
 +- Apply same process to dev and test set (same distribution)
 +- Also see what examples algo got right (not only wrong)
 +- Train and dev/test data may come from different distribution (no problem if slightly different)
  • data_mining/error_analysis.txt
  • Last modified: 2018/05/21 22:24
  • by phreazer