Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:xgboost [2016/04/24 16:30] – [CART] phreazer | data_mining:xgboost [2020/08/02 14:12] (current) – phreazer | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== XGBoost ====== | ====== XGBoost ====== | ||
- | XGBoost is short for “Extreme Gradient Boosting | + | //Extreme Gradient Boosting// |
- | Literature Greedy Function Approximation: | + | |
+ | Literature: Greedy Function Approximation: | ||
===== CART ===== | ===== CART ===== | ||
Line 20: | Line 20: | ||
$$ | $$ | ||
- | $F$ is space of functions containing all regression trees | + | ===== Gradient boosting ===== |
- | $K$ is number of trees | + | |
- | $f_k(x_i)$ is regression tree that maps a attribute to a score | + | * $F$ is space of functions containing all regression trees |
+ | | ||
+ | | ||
Learn functions (trees) instead of weights in $R^d$. | Learn functions (trees) instead of weights in $R^d$. | ||
Line 33: | Line 35: | ||
Learning objective: | Learning objective: | ||
- | * Training loss: Fit of the functions to the points | + | |
- | * Regularization: | + | |
Objective: | Objective: | ||
Line 57: | Line 59: | ||
* Logistic loss $l(y_i, | * Logistic loss $l(y_i, | ||
- | Stochastic Gradient | + | Stochastic Gradient |
Solution is **additive training**: Start with constant prediction, add a new function each time. | Solution is **additive training**: Start with constant prediction, add a new function each time. | ||
Line 80: | Line 82: | ||
- | Taylor expansion | + | ==== Taylor expansion |
Use taylor expansion to approximate a function through a power series (polynom). | Use taylor expansion to approximate a function through a power series (polynom). | ||
Line 93: | Line 95: | ||
$$\sum^n_{i=1} [l(y_i, | $$\sum^n_{i=1} [l(y_i, | ||
- | With removed constants | + | With removed constants |
$$\sum^n_{i=1} [g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)] + \Omega(f_t)$$ | $$\sum^n_{i=1} [g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)] + \Omega(f_t)$$ | ||
So that learning function only influences $g_i$ and $h_i$ while rest stays the same. | So that learning function only influences $g_i$ and $h_i$ while rest stays the same. |