data_mining:xgboost

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
data_mining:xgboost [2016/04/24 17:04] – [CART] phreazerdata_mining:xgboost [2020/08/02 16:11] phreazer
Line 1: Line 1:
 ====== XGBoost ====== ====== XGBoost ======
-XGBoost is short for “Extreme Gradient Boosting +//Extreme Gradient Boosting//
-Literature Greedy Function Approximation: A Gradient Boosting Machine, by Friedman+
  
 +Literature: Greedy Function Approximation: A Gradient Boosting Machine, by Friedman
  
 ===== CART ===== ===== CART =====
Line 20: Line 20:
 $$ $$
  
-$F$ is space of functions containing all regression trees +===== Gradient boosting ===== 
-$K$ is number of trees + 
-$f_k(x_i)$ is regression tree that maps a attribute to a score+  * $F$ is space of functions containing all regression trees 
 +  $K$ is number of trees 
 +  $f_k(x_i)$ is regression tree that maps a attribute to a score
  
 Learn functions (trees) instead of weights in $R^d$. Learn functions (trees) instead of weights in $R^d$.
Line 57: Line 59:
   * Logistic loss $l(y_i,\hat{y}_i)=y_i \ln(1+e^{-\hat{y}_i})+(1-y_i)\ln(1+e^{\hat{y}_i})$ (LogitBoost)   * Logistic loss $l(y_i,\hat{y}_i)=y_i \ln(1+e^{-\hat{y}_i})+(1-y_i)\ln(1+e^{\hat{y}_i})$ (LogitBoost)
  
-Stochastic Gradient Descent can not be applied, since trees are used.+Stochastic Gradient descent can not be applied, since trees are used.
  
 Solution is **additive training**: Start with constant prediction, add a new function each time. Solution is **additive training**: Start with constant prediction, add a new function each time.
Line 80: Line 82:
  
  
-Taylor expansion approximation of loss+==== Taylor expansion ====
  
-Use taylor expansion to approximate a function through a power series. +Use taylor expansion to approximate a function through a power series (polynom)
-$$Tf(x;a) = \sum_{n=0}^\inf\frac{f^{(n)}(a)}{n!}(x-a)^n$$+$$Tf(x;a) = \sum_{n=0}^\infty\frac{f^{(n)}(a)}{n!}(x-a)^n$$
  
 is taylor series of x of dev point a; $f^{(n)}$ is n-th derivative is taylor series of x of dev point a; $f^{(n)}$ is n-th derivative
 +
 +(Same derivative value for dev point a)
 +
  
 $$f(x+\Delta x) = f(x) + f'(x)\Delta x + \frac{1}{2} f''(x)\Delta x^2$$ $$f(x+\Delta x) = f(x) + f'(x)\Delta x + \frac{1}{2} f''(x)\Delta x^2$$
 $$\sum^n_{i=1} [l(y_i,\hat{y}_i^{(t-1)}) + g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)]$$ with $g_i=\delta_{\hat{y}^{(t-1)}} l(y_i,\hat{y}^{(t-1)})$ and $h_i=\delta^2_{\hat{y}^{(t-1)}} l(y_i,\hat{y}^{(t-1)})$ $$\sum^n_{i=1} [l(y_i,\hat{y}_i^{(t-1)}) + g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)]$$ with $g_i=\delta_{\hat{y}^{(t-1)}} l(y_i,\hat{y}^{(t-1)})$ and $h_i=\delta^2_{\hat{y}^{(t-1)}} l(y_i,\hat{y}^{(t-1)})$
 +
 +With removed constants (and square loss)
 +$$\sum^n_{i=1} [g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)] + \Omega(f_t)$$ 
 +So that learning function only influences $g_i$ and $h_i$ while rest stays the same.
 +
 +Complexity $\Omega(f_t)$
 +  * $\Omega(f_t) = \gamma t + \frac{1}{2} \lambda \sum^T_{j=1} w^2_j$
 +  * Number of leaves
 +  * L2 norm of leaf scores
  • data_mining/xgboost.txt
  • Last modified: 2020/08/02 16:12
  • by phreazer