Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
data_mining:neural_network:model_combination [2017/04/01 13:18] – [Approximating full Bayesian learning in a NN] phreazer | data_mining:neural_network:model_combination [2017/08/19 20:12] (current) – [Approximating full Bayesian learning in a NN] phreazer | ||
---|---|---|---|
Line 67: | Line 67: | ||
* Expensive, but works much better than ML learning, when posteriror is vague or multimodal (data is scarce). | * Expensive, but works much better than ML learning, when posteriror is vague or multimodal (data is scarce). | ||
+ | Monte Carlo method | ||
Idea: Might be good enough to sample weight vectors according to their posterior probabilities. | Idea: Might be good enough to sample weight vectors according to their posterior probabilities. | ||
- | Monte Carlo method | + | $p(y_{\text{test}} | \text{input}_\text{test}, |
+ | |||
+ | Sample weight vectors $p(W_i|D)$. | ||
+ | |||
+ | In Backpropagation, | ||
+ | |||
+ | With sampling: Add some gaussion noise to weight vector, after each update. | ||
+ | |||
+ | Markov Chain Monte Carlo: | ||
+ | |||
+ | If we use just the right amount of noise, and if we let thei weight vector wander around for long enough before we take a sample, we will get an ubiased sample form the true posterior over weight vectors. | ||
+ | |||
+ | More complicated and effective methods than MCMC method: Don't need to wander the space long. | ||
+ | |||
+ | If we compute gradient of cost function on a **random mini-batch**, | ||
+ | |||
+ | ====== Dropout ====== | ||
+ | See [[data_mining: | ||
- | Random weights |