data_mining:error_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
data_mining:error_analysis [2018/05/21 19:38] – [Working on most promising problems] phreazerdata_mining:error_analysis [2018/05/21 19:46] – [Misslabeled data] phreazer
Line 130: Line 130:
 Result: Calc percentage of problem category (potential improvement "ceiling") Result: Calc percentage of problem category (potential improvement "ceiling")
  
 +General rule: Build your first system quickly, then iterate (dev/test setup, build system, bias/variance & error analyis)
 ====== Misslabeled data ====== ====== Misslabeled data ======
  
Line 138: Line 139:
 Principles when fixing labels: Principles when fixing labels:
  
-Apply same process to dev and test set (same distribution) +  * Apply same process to dev and test set (same distribution) 
-Also see what examples algo got right (not only wrong) +  Also see what examples algo got right (not only wrong) 
-Train and dev/test data may come from different distribution (no problem if slightly different)+  Train and dev/test data may come from different distribution (no problem if slightly different)
  • data_mining/error_analysis.txt
  • Last modified: 2018/05/21 22:24
  • by phreazer