Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:xgboost [2019/05/02 22:52] – phreazer | data_mining:xgboost [2020/08/02 14:12] (current) – phreazer | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== XGBoost ====== | ====== XGBoost ====== | ||
//Extreme Gradient Boosting// | //Extreme Gradient Boosting// | ||
+ | |||
Literature: Greedy Function Approximation: | Literature: Greedy Function Approximation: | ||
Line 19: | Line 20: | ||
$$ | $$ | ||
- | $F$ is space of functions containing all regression trees | + | ===== Gradient boosting ===== |
- | $K$ is number of trees | + | |
- | $f_k(x_i)$ is regression tree that maps a attribute to a score | + | * $F$ is space of functions containing all regression trees |
+ | | ||
+ | | ||
Learn functions (trees) instead of weights in $R^d$. | Learn functions (trees) instead of weights in $R^d$. | ||
Line 32: | Line 35: | ||
Learning objective: | Learning objective: | ||
- | * Training loss: Fit of the functions to the points | + | |
- | * Regularization: | + | |
Objective: | Objective: | ||
Line 56: | Line 59: | ||
* Logistic loss $l(y_i, | * Logistic loss $l(y_i, | ||
- | Stochastic Gradient | + | Stochastic Gradient |
Solution is **additive training**: Start with constant prediction, add a new function each time. | Solution is **additive training**: Start with constant prediction, add a new function each time. | ||
Line 79: | Line 82: | ||
- | Taylor expansion | + | ==== Taylor expansion |
Use taylor expansion to approximate a function through a power series (polynom). | Use taylor expansion to approximate a function through a power series (polynom). | ||
Line 92: | Line 95: | ||
$$\sum^n_{i=1} [l(y_i, | $$\sum^n_{i=1} [l(y_i, | ||
- | With removed constants | + | With removed constants |
$$\sum^n_{i=1} [g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)] + \Omega(f_t)$$ | $$\sum^n_{i=1} [g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)] + \Omega(f_t)$$ | ||
So that learning function only influences $g_i$ and $h_i$ while rest stays the same. | So that learning function only influences $g_i$ and $h_i$ while rest stays the same. |