Hamiltonian Monte Carlo

Written by David Mimno

There have been a number of papers at NIPS that use Hamiltonian Monte Carlo, and I thought I’d share a Javascript implementation of the algorithm that I wrote a couple years ago. It can be a fairly opaque algorithm when described mathematically, and I found it really useful to see it working. It turns out it’s kind of mesmerizing.

The idea of Monte Carlo is that you want to sample from a distribution. Imagine a probability distribution over an x and a y variable as a physical surface, where regions of high probability look like depressions and regions of low probability look like high plateaus. We want to sample points in low (likely) regions more often than high (unlikely) regions. Here the gray regions show what the likelihood surface looks like. As we sample points, we should get more in the darker regions, and fewer sampled points in the lighter regions.

Accepted: 0/0 = 0.0

HMC uses this physical analogy to sample points from the distribution. The algorithm works by shooting a particle in a random direction along a surface defined by the target likelihood function. As it goes “uphill” to regions of lower probability, the particle slows down, and as it goes “downhill” to regions of higher probability it speeds up. After computing the trajectory for 20 steps, we choose whether to accept the particles position as our new starting point, or reject the particle and start again. We reject if the location is too unlikely or if the particle is moving too fast.

In this simulator you can sample from a long, skinny Gaussian distribution (the default) or a round Cauchy distribution.

One of the things I learned from this simulator is that the sampler can get “stuck” in tails of heavy-tailed distributions. In the Cauchy example, it’s possible to get pretty far from the centroid, because the probability mass doesn’t decline quickly like the Gaussian. But then it becomes impossible to move to a new point, because any particle shot from that far off the probability center is moving so fast when we make the accept/reject decision that we will never accept. I fixed this by changing the balance of the two acceptance criteria.