data_mining:neural_network:model_combination

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data_mining:neural_network:model_combination [2017/04/01 13:16] – [Full Bayesian Learning] phreazerdata_mining:neural_network:model_combination [2017/08/19 22:12] (current) – [Approximating full Bayesian learning in a NN] phreazer
Line 65: Line 65:
   * NN with few parameters. Put grid over parameter space and evaluate $p(W|D)$ at each grid-point. (xpensive, but no local optimum issues).   * NN with few parameters. Put grid over parameter space and evaluate $p(W|D)$ at each grid-point. (xpensive, but no local optimum issues).
   * After evaluating each grid point, we use all of them to make predictions on test data.   * After evaluating each grid point, we use all of them to make predictions on test data.
-    * Expensive, but works much better than ML learning, when posteriror is vague or multimodal (data is scarce). +    * Expensive, but works much better than ML learning, when posteriror is vague or multimodal (data is scarce). 
 + 
 +Monte Carlo method 
 + 
 +Idea: Might be good enough to sample weight vectors according to their posterior probabilities. 
 + 
 +$p(y_{\text{test}} | \text{input}_\text{test}, D) = \sum_i p(W_i|D) p(y_{\text{test}} | \text{input}_\text{test}, W_i)$ 
 + 
 +Sample weight vectors $p(W_i|D)$. 
 + 
 +In Backpropagation, we keep moving weights in the direction that decreases the costs. 
 + 
 +With sampling: Add some gaussion noise to weight vector, after each update. 
 + 
 +Markov Chain Monte Carlo: 
 + 
 +If we use just the right amount of noise, and if we let thei weight vector wander around for long enough before we take a sample, we will get an ubiased sample form the true posterior over weight vectors. 
 + 
 +More complicated and effective methods than MCMC method: Don't need to wander the space long. 
 + 
 +If we compute gradient of cost function on a **random mini-batch**, we will get an unbiased estimate with sampling noise. 
 + 
 +====== Dropout ====== 
 +See [[data_mining:neural_network:regularization|Regularization]] 
  • data_mining/neural_network/model_combination.1491045409.txt.gz
  • Last modified: 2017/04/01 13:16
  • by phreazer