Briefly, we expect Binomial, Poisson, and other relationships to have a fixed relationship with their variance. For Poisson, variance = mean. For Binomial, the variance is size*p(1-p). This might not be the case. In glms, we used the quasi- family of distributions to essentially add a scaling factor to both of those relationships. This was a bit of a hack, but it got the work done. What does this look like, well -
Thus far, we’ve solved overdispersion by including a scaling parameter. However, a more straightfoward way is to use a **compound distribution*. This is a distribution where we assume that our parameter of interest in our likelihood function has a ditribution itself beyond just our predictors. So, for example, our estimated probability in our binomial distribution has some sort of distribution itself. Or, our lambda value has a distribution around it.
This has the advantage that we can combine this distributions into a single likelihood function rather than the hack of a quasi- distribution, and if we wanted we can even model the parameters that lead to overdispersion. Not only that, but it better safeguards against ‘impossible’ values which can become a problem with the quasi- approach.
Let’s see this in action with an example of reed frog survivorship under different treatments.
library(rethinking)
data(reedfrogs)
head(reedfrogs)
## density pred size surv propsurv
## 1 10 no big 9 0.9
## 2 10 no big 10 1.0
## 3 10 no big 7 0.7
## 4 10 no big 10 1.0
## 5 10 no small 9 0.9
## 6 10 no small 9 0.9
For the moment, we’re going to ignore all of the things done to these frogs and just model survivorship. Why? Well, sometimes overdispersion is caused by our lack of knowledge about other predictors in a system. While in a perfect world, we’d have that information, if we don’t, we can at least model the heterogeneity that increases our variance.
So, let’s start with our model of survivorship.
frog_mod <- alist(
#likelihood
surv ~ dbinom(density, prob),
#DGP
logit(prob) <- a,
a~dnorm(0,10)
)
frog_fit <- map(frog_mod, data=reedfrogs)
It’s nice, it’s simple. But does it work? Let’s look at the posterior. We don’t have multiple values coming out, so let’s just look at the samples.
frog_samp <- extract.samples(frog_fit)
plot(density(frog_samp$a))
Well that looks OK. But…. let’s look at the QQ distribution of the observed values relative to their predictions. Note, I’m using the gap
package here, as it has a qqunif
so you don’t have to roll your own.
frog_sim <- sim(frog_fit)
u_frog <- sapply(1:nrow(reedfrogs), function(i) sum(frog_sim[,i] < reedfrogs$surv[i]))/1000
gap::qqunif(u_frog, logscale=FALSE)
Well that is….odd. And it indicates overdispersion! You can see this in postcheck
as well (try it!) but the QQ plot is much nicer, in my humble opinion.
So, how do we model this overdispersion? First, what parameter of the binomial has additional variability? Well, it’s not the size. Instead, it’s the probability. OK, we have a parameter that we’re estimating the mean value of, but then varies. It varies between 0 and 1. There are a few distributions this could take - like the uniform. But one of the more meaningful is the beta. The beta distribution takes two parameters and can either look almost “normal”, flat, or with its mass piled up at 0 and 1. So, the uniform is a special case. This makes it ideal for our purposes. Particularly as we’d assume that, when we estimate a mean probability, that most of the resulting probabilities should likely be close to it. Let’s see what the beta looks like.
dbetabinom
takes a probability and then a scaling parameter and converts them into a and b values. So it looks like the following:
OK, so, let’s put this in action for our frog model!
Our model is pretty similar to our old model:
frog_mod_bb <- alist(
#likelihood
surv ~ dbetabinom(density, prob, theta),
#DGP
logit(prob) <- a,
a~dnorm(0,10),
theta ~ dexp(3)
)
frog_fit_bb <- map(frog_mod_bb, data=reedfrogs)
OK, let’s look at those QQs now!
frog_sim_bb <- sim(frog_fit_bb)
## [ 100 / 1000 ]
[ 200 / 1000 ]
[ 300 / 1000 ]
[ 400 / 1000 ]
[ 500 / 1000 ]
[ 600 / 1000 ]
[ 700 / 1000 ]
[ 800 / 1000 ]
[ 900 / 1000 ]
[ 1000 / 1000 ]
u_frog_bb <- sapply(1:nrow(reedfrogs), function(i) sum(frog_sim_bb[,i] < reedfrogs$surv[i]))/1000
gap::qqunif(u_frog_bb, logscale=FALSE)
Look at that improvement! We’ve done a much better job this time around. We can now use these results for prediction, etc.
One last note, if you want to see that overdispersion, we can plot how the distribution of parameters plays out using dbeta2
frog_coefs_bb <- extract.samples(frog_fit_bb, n=50)
f_coef_dens <- crossing(x=seq(0,1,length.out=100), as.data.frame(frog_coefs_bb)) %>%
mutate(dens = dbeta2(x, logistic(a), theta))
ggplot(f_coef_dens, aes(x=x, y=dens, group=paste(a, theta))) +
geom_line(alpha=0.2)
What’s great about this plot is that you can see that, for many values, the probability is piled up at the ends. This indicates a ton of overdispersion, and that you’re definitely missing something big here.
A. Plot model predictions with both forms of error. What does this show? B. What does WAIC tell you about betabinomial models with versus without one or more predictor? C. Going back to the binomial model, does including one or more predictor alleviate the overdispersion problem in the orignal model?
One of the most useful distributions you will run into is the Gamma Poison. Why? Because it’s just another name for the negative binomial. This is a distribution for count data whose variance increases faster than its mean! While we do have a specialized distribution for the Gamma Poisson, it s abit more natural to refer to it with this name as the negative binomial is a special case of the GP. How does the GP work? Essentially, the rate parameter of your standard Poisson is Gamma distribted. There are a few ways do model this, but we can just use a mean and a scaling factor using dgammapois
. For example, let’s assume a mean of 40.
Bigger scale, more spread in the data. There’s also some movement of the mode, but the mean stays the same.
Let’s take a look at the damage caused by different hurricanes. One theory out there is that hurricanes with feminine names tend to cause more harm, as people take them less seriously.
data(Hurricanes)
head(Hurricanes)
## name year deaths category min_pressure damage_norm female femininity
## 1 Easy 1950 2 3 960 1590 1 6.77778
## 2 King 1950 4 3 955 5350 0 1.38889
## 3 Able 1952 3 1 985 150 0 3.83333
## 4 Barbara 1953 1 1 987 58 1 9.83333
## 5 Florence 1953 0 1 985 15 1 8.33333
## 6 Carol 1954 60 3 960 19321 1 8.11111
huric_mod <- alist(
#likelihood
deaths ~ dpois(lambda),
#Data generating process
log(lambda) <- a + b*femininity,
#priors
a ~ dnorm(0,10),
b ~ dnorm(0,10)
)
huric_fit <- map(huric_mod, data=Hurricanes)
OK, let’s examine the qq plot of the posterior to see if it’s overdispersed. Rather than do all of that sapply code, here’s a quickie function for future use
qq_posterior <- function(fit, observations, n=1000, logscale=FALSE){
sim_output <- sim(fit, n=n)
u <- sapply(1:length(observations), function(i) sum(sim_output[,i] < observations[i]))/n
gap::qqunif(u, logscale=logscale)
}
And we can use it here with
qq_posterior(huric_fit, Hurricanes$deaths)
Hello overdispersion my old friend.
Fortunately, we’re now familiar with overdispersion, and, given the overdispersion here, we can look at the Gamma Poisson as a nice possible solution.
huric_mod_gp <- alist(
#likelihood
deaths ~ dgampois(lambda, scale),
#Data generating process
log(lambda) <- a + b*female,
#priors
a ~ dnorm(0,10),
b ~ dnorm(0,10),
scale ~ dexp(2)
)
huric_fit_gp <- map(huric_mod_gp, data=Hurricanes)
OK, fits, but, how did it work out?
qq_posterior(huric_fit_gp, Hurricanes$deaths)
Much much better. We can also look again at the posterior distribution for select values of lambda
. Let’s say, for example, a femininity score of 1 for 11, the range of our data.
huric_coefs_gp <- extract.samples(huric_fit_gp, n=50)
h_coef_dens <- crossing(femininity = c(1, 11), x=0:20, data.frame(huric_coefs_gp)) %>%
mutate(dens = dgampois(x, exp(a + b*femininity), scale))
ggplot(h_coef_dens, aes(x=x, y=dens, color=factor(femininity), group=factor(paste(scale, femininity, a, b)))) +
geom_line(alpha=0.2) +
facet_wrap(~femininity, labeller="label_both") +
scale_color_manual(values=c("red", "blue"), guide="none")
A. Plot model predictions! What does this show? B. What does WAIC tell you about models with additional predictors? What about an offset for damage? C. Going back to the poisson model, does including one or more predictor alleviate the overdispersion problem in the orignal model? Particularly the offset of initial damage?