data_mining:xgboost

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
data_mining:xgboost [2019/05/03 01:09] phreazerdata_mining:xgboost [2019/05/03 01:13] – [Taylor expansion] phreazer
Line 20: Line 20:
 $$ $$
  
-Gradient boosting+===== Gradient boosting =====
  
 $F$ is space of functions containing all regression trees $F$ is space of functions containing all regression trees
Line 59: Line 59:
   * Logistic loss $l(y_i,\hat{y}_i)=y_i \ln(1+e^{-\hat{y}_i})+(1-y_i)\ln(1+e^{\hat{y}_i})$ (LogitBoost)   * Logistic loss $l(y_i,\hat{y}_i)=y_i \ln(1+e^{-\hat{y}_i})+(1-y_i)\ln(1+e^{\hat{y}_i})$ (LogitBoost)
  
-Stochastic Gradient Descent can not be applied, since trees are used.+Stochastic Gradient descent can not be applied, since trees are used.
  
 Solution is **additive training**: Start with constant prediction, add a new function each time. Solution is **additive training**: Start with constant prediction, add a new function each time.
Line 82: Line 82:
  
  
-Taylor expansion approximation of loss+==== Taylor expansion ====
  
 Use taylor expansion to approximate a function through a power series (polynom). Use taylor expansion to approximate a function through a power series (polynom).
Line 95: Line 95:
 $$\sum^n_{i=1} [l(y_i,\hat{y}_i^{(t-1)}) + g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)]$$ with $g_i=\delta_{\hat{y}^{(t-1)}} l(y_i,\hat{y}^{(t-1)})$ and $h_i=\delta^2_{\hat{y}^{(t-1)}} l(y_i,\hat{y}^{(t-1)})$ $$\sum^n_{i=1} [l(y_i,\hat{y}_i^{(t-1)}) + g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)]$$ with $g_i=\delta_{\hat{y}^{(t-1)}} l(y_i,\hat{y}^{(t-1)})$ and $h_i=\delta^2_{\hat{y}^{(t-1)}} l(y_i,\hat{y}^{(t-1)})$
  
-With removed constants+With removed constants (and square loss)
 $$\sum^n_{i=1} [g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)] + \Omega(f_t)$$  $$\sum^n_{i=1} [g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)] + \Omega(f_t)$$ 
 So that learning function only influences $g_i$ and $h_i$ while rest stays the same. So that learning function only influences $g_i$ and $h_i$ while rest stays the same.
  • data_mining/xgboost.txt
  • Last modified: 2020/08/02 16:12
  • by phreazer