data_mining:error_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
data_mining:error_analysis [2018/05/21 19:55] – [Misslabeled data] phreazerdata_mining:error_analysis [2018/05/21 22:05] – [Missmatched train and dev/test set] phreazer
Line 154: Line 154:
     * Train set: 205.000 with high and low qual; Dev & Test: 2500 low quality     * Train set: 205.000 with high and low qual; Dev & Test: 2500 low quality
     * Advantage: Optimizing right data     * Advantage: Optimizing right data
-    * Disadvantage: Train distr. is different than dev and test set +    * Disadvantage: Train distr. is different than dev and test set 
 + 
 +====== Problems with different train and dev/test set dist ====== 
 + 
 +Not always good idea to use different dist in train and dev 
 + 
 +  * Human error ~ 0 
 +  * Train 1% 
 +  * Dev 10% 
 + 
 +Training-dev set: same distribution as training set, but not used for training 
 + 
 +  * Train 1% 
 +  * Train-dev: 9% 
 +  * Dev: 10% 
 + 
 +Still high gap between train and train-dev => variance problem 
  • data_mining/error_analysis.txt
  • Last modified: 2018/05/21 22:24
  • by phreazer