Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:xgboost [2016/04/24 15:04] – [CART] phreazer | data_mining:xgboost [2020/08/02 14:12] (current) – phreazer | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== XGBoost ====== | ====== XGBoost ====== | ||
- | XGBoost is short for “Extreme Gradient Boosting | + | //Extreme Gradient Boosting// |
- | Literature Greedy Function Approximation: | + | |
+ | Literature: Greedy Function Approximation: | ||
===== CART ===== | ===== CART ===== | ||
Line 20: | Line 20: | ||
$$ | $$ | ||
- | $F$ is space of functions containing all regression trees | + | ===== Gradient boosting ===== |
- | $K$ is number of trees | + | |
- | $f_k(x_i)$ is regression tree that maps a attribute to a score | + | * $F$ is space of functions containing all regression trees |
+ | | ||
+ | | ||
Learn functions (trees) instead of weights in $R^d$. | Learn functions (trees) instead of weights in $R^d$. | ||
Line 33: | Line 35: | ||
Learning objective: | Learning objective: | ||
- | * Training loss: Fit of the functions to the points | + | |
- | * Regularization: | + | |
Objective: | Objective: | ||
Line 57: | Line 59: | ||
* Logistic loss $l(y_i, | * Logistic loss $l(y_i, | ||
- | Stochastic Gradient | + | Stochastic Gradient |
Solution is **additive training**: Start with constant prediction, add a new function each time. | Solution is **additive training**: Start with constant prediction, add a new function each time. | ||
Line 80: | Line 82: | ||
- | Taylor expansion | + | ==== Taylor expansion |
- | Use taylor expansion to approximate a function through a power series. | + | Use taylor expansion to approximate a function through a power series |
- | $$Tf(x;a) = \sum_{n=0}^\inf\frac{f^{(n)}(a)}{n!}(x-a)^n$$ | + | $$Tf(x;a) = \sum_{n=0}^\infty\frac{f^{(n)}(a)}{n!}(x-a)^n$$ |
is taylor series of x of dev point a; $f^{(n)}$ is n-th derivative | is taylor series of x of dev point a; $f^{(n)}$ is n-th derivative | ||
+ | |||
+ | (Same derivative value for dev point a) | ||
+ | |||
$$f(x+\Delta x) = f(x) + f' | $$f(x+\Delta x) = f(x) + f' | ||
$$\sum^n_{i=1} [l(y_i, | $$\sum^n_{i=1} [l(y_i, | ||
+ | |||
+ | With removed constants (and square loss) | ||
+ | $$\sum^n_{i=1} [g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)] + \Omega(f_t)$$ | ||
+ | So that learning function only influences $g_i$ and $h_i$ while rest stays the same. | ||
+ | |||
+ | Complexity $\Omega(f_t)$ | ||
+ | * $\Omega(f_t) = \gamma t + \frac{1}{2} \lambda \sum^T_{j=1} w^2_j$ | ||
+ | * Number of leaves | ||
+ | * L2 norm of leaf scores |