Differences

This shows you the differences between two versions of the page.

--- data_mining:xgboost [2019/05/03 01:10] – phreazer
+++ data_mining:xgboost [2020/08/02 16:11] – phreazer
@@ Line 22: / Line 22: @@
 ===== Gradient boosting =====
-$F$ is space of functions containing all regression trees
+  * $F$ is space of functions containing all regression trees
-$K$ is number of trees
+  * $K$ is number of trees
-$f_k(x_i)$ is regression tree that maps a attribute to a score
+  * $f_k(x_i)$ is regression tree that maps a attribute to a score
 Learn functions (trees) instead of weights in $R^d$.
@@ Line 95: / Line 95: @@
 $$\sum^n_{i=1} [l(y_i,\hat{y}_i^{(t-1)}) + g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)]$$ with $g_i=\delta_{\hat{y}^{(t-1)}} l(y_i,\hat{y}^{(t-1)})$ and $h_i=\delta^2_{\hat{y}^{(t-1)}} l(y_i,\hat{y}^{(t-1)})$
-With removed constants
+With removed constants (and square loss)
 $$\sum^n_{i=1} [g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)] + \Omega(f_t)$$
 So that learning function only influences $g_i$ and $h_i$ while rest stays the same.