Markov Chain Monte Carlo


Our Story Thus Far…

  • We have been using Maximum A Posteriori Approximations

  • Assumes Gaussian posterior (approximately quadratic)

  • Great for simple models

But…

  • We’ve noticed problems with models of moderate complexity

  • Many problems do not have easy analytical solution
    • Autocorrelation
    • State-Space Models
    • Mixed Models

  • Solution is simulated draws form the posterior…

King Markov and His Islands

King Markov and His Islands

How to move around Islands

  • Flip a coin. Heads, move. Tails, stay.
  • Figure out the number of people on each island.
  • Assign a probability, next island / (total population)
  • Choose a random number. If number < p, move.
  • Rinse and repeat
  • What Metropolis MCMC Looks Like

    mcmc_ex <- function(num_weeks = 1e5, current=10, positions = rep(0, num_weeks)){
      for ( i in 1:num_weeks ) {
        # record current position
        positions[i] <- current
      
        # flip coin to generate proposal
        proposal <- current + sample( c(-1,1) , size=1 )
        # now make sure he loops around the archipelago
        if ( proposal < 1 ) proposal <- 10
        if ( proposal > 10 ) proposal <- 1
      
        # move?
        prob_move <- proposal/current
        current <- ifelse( runif(1) < prob_move , proposal , current )
      }
    
      positions
    }

    Metropolis MCMC in Action: 10 Weeks

    Population = Island Number

    Metropolis MCMC in Action: 50 Weeks

    Population = Island Number

    Metropolis MCMC in Action: 1000 Weeks

    Population = Island Number

    Metropolis MCMC For Models

    MCMC In Practice for Models





    from http://www.cnblogs.com/Nietzsche/p/4255948.html

    MCMC is not a Panacea

    MCMC is not a Panacea

    How can MCMC Fail?

    • MCMC (particularly Metropolis) can get stuck

    • Start values can still be important

    • Particularly a problem with many parameters which are correlated

    • One way we try and assess is fit with many chains and make sure they converge

    MCMC Algorithms

    • Metropolis MCMC inefficient

    • Many algorithms to come up with clever proposed moves to speed up

    • Gibbs sampling used for BUGS, JAGS, etc.
      - Still has same problems as Metropilis

    • Or… Abandon search and use more deterministic sampling
      • Hamiltonian MCMC

    King Hamilton and His BatBoat

    King Hamilton and His BatBoat

    • Boat passes by all of the island, back and forth

    • Boat slows down to see people in porportion to how many folk

    • We sample position through time, more positions in areas where boat is slow

    Metropolis versus Hamiltonian




    Neal 2011, http://www.cs.utoronto.ca/~radford/ham-mcmc.abstract.html

    Metropolis versus Hamiltonian




    Neal 2011, http://www.cs.utoronto.ca/~radford/ham-mcmc.abstract.html

    Metropolis versus Hamiltonian




    Neal 2011, http://www.cs.utoronto.ca/~radford/ham-mcmc.abstract.html

    Implementing HMCMC via Stan

    • We use the map2stan function to call STAN
      • Compiles a model, so it can take a while

    • Can specify number of chains and other parameters

    • And now our samples are already part of our model!

    • Careful, models can get large (in size) depending on number of parameters and samples

    Data Prep for HMCMC

    data(rugged)
    rugged$log_gdp <- log(rugged$rgdppc_2000)
    
    #Prep the data
    d <- rugged[complete.cases(rugged$log_gdp),]
    
    # Need indices >0
    d$cont_africa <- d$cont_africa +1
    
    # Only want the data we are using
    # (otherwise slows STAN down)
    d.trim <- d[ , c("log_gdp","rugged","cont_africa") ]

    The Model…

    int_mod <- alist(
      #likelihood
      log_gdp ~ dnorm(mu, sigma),
      
      #Data generating process
      mu <- bR[cont_africa]*rugged + bA[cont_africa],
      
      #priors
      bR[cont_africa] ~ dnorm(0,1),
      bA[cont_africa] ~ dnorm(8,100),
      sigma ~ dcauchy(0,2)
    )
    Wait, Cauchy???

    Sidenote: the Cauchy Distribution

    • Pronounced Ko-she
    • A ratio of two normal distributions
    • Large thick tail
      • Extreme values regularly sampled
    • Uses half-cauchy, so, only positive

    Fitting with map2stan

    fit <- map2stan(int_mod, data=d.trim)
    warning: unknown warning option '-Wno-nneeded-internal-declaration'; did you mean '-Wno-unneeded-internal-declaration'? [-Wunknown-warning-option]
    In file included from file111646090ffe9.cpp:8:
    In file included from /Users/jearbear/Library/R/3.3/library/StanHeaders/include/src/stan/model/model_header.hpp:4:
    In file included from /Users/jearbear/Library/R/3.3/library/StanHeaders/include/stan/math.hpp:4:
    In file included from /Users/jearbear/Library/R/3.3/library/StanHeaders/include/stan/math/rev/mat.hpp:4:
    In file included from /Users/jearbear/Library/R/3.3/library/StanHeaders/include/stan/math/rev/core.hpp:12:
    In file included from /Users/jearbear/Library/R/3.3/library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5:
    In file included from /Users/jearbear/Library/R/3.3/library/StanHeaders/include/stan/math/rev/core/var.hpp:7:
    In file included from /Users/jearbear/Library/R/3.3/library/BH/include/boost/math/tools/config.hpp:13:
    In file included from /Users/jearbear/Library/R/3.3/library/BH/include/boost/config.hpp:39:
    /Users/jearbear/Library/R/3.3/library/BH/include/boost/config/compiler/clang.hpp:196:11: warning: 'BOOST_NO_CXX11_RVALUE_REFERENCES' macro redefined [-Wmacro-redefined]
    #  define BOOST_NO_CXX11_RVALUE_REFERENCES
              ^
    <command line>:6:9: note: previous definition is here
    #define BOOST_NO_CXX11_RVALUE_REFERENCES 1
            ^
    2 warnings generated.
    
    SAMPLING FOR MODEL 'log_gdp ~ dnorm(mu, sigma)' NOW (CHAIN 1).
    
    Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)
     Elapsed Time: 0.121338 seconds (Warm-up)
                   0.103927 seconds (Sampling)
                   0.225265 seconds (Total)
    
    
    SAMPLING FOR MODEL 'log_gdp ~ dnorm(mu, sigma)' NOW (CHAIN 1).
    WARNING: No variance estimation is
             performed for num_warmup < 20
    
    
    Chain 1, Iteration: 1 / 1 [100%]  (Sampling)
     Elapsed Time: 3e-06 seconds (Warm-up)
                   4.7e-05 seconds (Sampling)
                   5e-05 seconds (Total)
    
    [ 100 / 1000 ]
    [ 200 / 1000 ]
    [ 300 / 1000 ]
    [ 400 / 1000 ]
    [ 500 / 1000 ]
    [ 600 / 1000 ]
    [ 700 / 1000 ]
    [ 800 / 1000 ]
    [ 900 / 1000 ]
    [ 1000 / 1000 ]
    • Note where errors occur
      • Warmup only?
      • How often in your chain?

    Inspect your Chains for convergence!

    • Note, grey area is “warmup”
      • Warmup is the BatBoat motoring around, tuning up
      • Not used for posterior

    Multiple Chains

    fit_chains <- map2stan(int_mod, data=d.trim,
                           chains = 4, cores=4)
    • Can fit with multiple chains to inspect convergence

    • Yay multicore computers!

    Multiple Chains

    Assessing Convergence

    precis(fit_chains, depth=2)
           Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
    bR[1] -0.20   0.08      -0.33      -0.08  2339    1
    bR[2]  0.19   0.11       0.02       0.36  2975    1
    bA[1]  9.22   0.14       9.00       9.44  2210    1
    bA[2]  7.28   0.18       6.99       7.57  3133    1
    sigma  0.95   0.05       0.86       1.03  4000    1
    • n_eff is effective number of samples in chain
      • Should be reasonably large

    • Rhat is a measure of convergence
      • Gelman-Rubin diagnostic
      • Should be 1 - even 1.01 is suspect

    • Treat as warnings - necessary but not sufficient

    What do bad chains look like?

    Lack of Convergence

    • Might be that model has not found good values
    • More likely bad model
      • Too many parameters
      • Redundant parameters
      • Poor fit to data

    General Workflow

    1. Fit one chain to make sure things look OK
      • warmup = 1000, iter=2000 OK

    2. Fit multiple chains to ensure convergence
      • Inspect n_eff and r_hat
      • Make sure posterior converges and is stationary
      • Tune HMC and model parameters

    3. Fit one chain with large everything tuned
      • Can use more chains if you have the processors

    Exercise

    • Refit your homework models

    • Do estimates differ from MAP?

    • Try uncentered model - does it perform better?