Genearlized Linear Models

A Generalized Outline

  1. Why use GLMs? An Intro to Entropy

  2. Logistic Regression Verus Linear Regression

  3. Generalized Linear Models

  4. Poisson Regression (Poisson Error, Long Link)

What is Maximum Entropy?

Maximum Entropy Principle



The distribution that can happen the most ways is also the distribution with the biggest information entropy. The distribution with the biggest entropy is the most conservative distribution that obeys its constraints.


-McElreath 2017

Why are we thinking about MaxEnt?

  • MaxEnt distributions have the widest spread - conservative

  • Nature tends to favor maximum entropy distributions
    • It’s just natural probability

  • The foundation of Generalized Linear ModelDistributions

  • Leads to useful distributions once we impose constraints

McElreath 2016

McElreath 2016

McElreath 2016

McElreath 2016

McElreath 2016

McElreath 2016

McElreath 2016

McElreath 2016

Information Entropy

\[H(p) = - \sum p_i log \, p_i\]

  • Measure of uncertainty

  • If more events possible, it increases

  • Nature finds the distribution with the largest entropy, given constraints of distribution

Maximum Entropy and Coin Flips

  • Let’s say you are flipping a fair (p=0.5) coin twice

  • What is the maximum entropy distribution of # Heads?

  • Possible Outcomes: TT, HT, TH, HH
    • which leads to 0, 1, 2 heads

  • Constraint is, with p=0.5, the average outcome is 1 heads

The Binomial Possibilities

TT = p2
HT = p(1-p)
TH = (1-p)p
HH = p2

But other distributions are possible

Let’s compareother distributions meeting constraint using Entropy

Remember, we must average 1 Heads, so,
sum(distribution * 0,1,1,2) = 1

\[H = - \sum{p_i log p_i}\]

Distribution TT, HT, TH, HH Entropy
Binomial 1/4, 1/4, 1/4, 1/4 1.386
Candiate 1 2/6, 1/6, 1/6, 2/6 1.33
Candiate 2 1/6, 2/6, 2/6, 1/6 1.33
Candiate 3 1/8, 1/2, 1/8, 2/8 1.213

Binomial wins!

What about other p’s and draws?

Assume 2 draws, p=0.7, make 1000 simulated distributions

OK, what about the Gaussian?

Maximum Entropy Distributions

Constraints Maxent distribution
Real value in interval Uniform
Real value, finite variance Gaussian
Binary events, fixed probability Binomial
Non-negative real, has mean Exponential

How Distributions are Coupled

A Generalized Outline

  1. Why use GLMs? An Intro to Entropy

  2. Logistic Regression Verus Linear Regression

  3. Generalized Linear Models

  4. Poisson Regression (Poisson Error, Long Link)

Infection by Cryptosporidium

Cryptosporidum Infection Rates

This is not linear or gaussian

Why?

The General Linear Model

\[\Large \boldsymbol{Y_i} = \boldsymbol{\beta X_i} + \boldsymbol{\epsilon} \]





\[\Large \epsilon \sim \mathcal{N}(0,\sigma^{2})\]

The General Linear Model

Likelihood:
\[\Large Y_i \sim \mathcal{N}(\hat{Y_i},\sigma^{2})\]



Data Generating Process:
\[\Large \boldsymbol{\hat{Y}_{i}} = \boldsymbol{\beta X_i} \]

The General(ized) Linear Model

Likelihood:
\[\Large Y_i \sim \mathcal{N}(\hat{Y_i},\sigma^{2})\]

Data Generating Process:
- Transformation (Identity Link):
\[\Large \hat{Y}_{i} = \eta_{i} \]
- Linear Equation:
\[\Large \boldsymbol{\eta_{i}} = \boldsymbol{\beta X_i} \]

A Generalized Linear Model with an Exponential Curve

Likelihood:
\[\Large Y_i \sim \mathcal{N}(\hat{Y_i},\sigma^{2})\]

Data Generating Process:
- Transformation (Log Link):
\[\Large Log(\hat{Y}_{i}) = \eta_{i} \]



- Linear Equation:
\[\Large \boldsymbol{\eta_{i}} = \boldsymbol{\beta X_i} \]

Isn’t this just a transformation?

Aren’t we just doing \[\Large \boldsymbol{log(Y_{i})} = \boldsymbol{\beta X_i} + \boldsymbol{\epsilon_i}\]
NO!

\[\Large \boldsymbol{Y_{i}} = e^{\boldsymbol{\beta X_i} + \boldsymbol{\epsilon_i}}\]

Error is log-normal

Likelihood:
\[\Large Y_i \sim \mathcal{N}(\hat{Y_i},\sigma^{2})\] Error is Normal

Data Generating Process:
- Transformation (Log Link):
\[\Large Log(\hat{Y}_{i}) = \eta_{i} \]



- Linear Equation:
\[\Large \boldsymbol{\eta_{i}} = \boldsymbol{\beta X_i} \]

But This is Not Normal

Binomial Distribution

\[ Y_i \sim B(prob, size) \]

  • Discrete Distribution
  • prob = probability of something happening (% Infected)
  • size = # of discrete trials
  • Used for frequency or probability data
  • We estimate coefficients that influence prob

So, Y is a Logistic Curve


\[Probability = \frac{1}{1+e^{\beta X}}\]

\[logit(Probability) = \beta X\]

Logitistic Regression

Outputs

LR Chisq Df Pr(>Chisq)
Dose 233.8357 1 0

And logit coefficients

term estimate std.error statistic p.value
(Intercept) -1.4077690 0.1484785 -9.481298 0
Dose 0.0134684 0.0010464 12.870912 0

The Odds

\[Odds = \frac{p}{1-p}\]

\[Log-Odds = Log\frac{p}{1-p} = logit(p)\]

The Meaning of a Logit Coefficient

Logit Coefficient: A 1 unit increase in a predictor = an increase of \(\beta\) increase in the log-odds of the response.

\[\beta = logit(p_2) - logit(p_1)\]

\[\beta = Log\frac{p_1}{1-p_1} - Log\frac{p_2}{1-p_2}\]


We need to know both p1 and \(\beta\) to interpret this.

If p1 = 0.5, \(\beta\) = 0.01347, then p2 = 0.503 

If p1 = 0.7, \(\beta\) = 0.01347, then p2 = 0.702

But how do we assess assumptions?

  • Should still be no fitted v. residual relationship

  • But QQ plots lose meaning
    • Not a normal distribution
    • Mean scales with variance

  • Also many types of residuals
    • Deviance, Pearson, raw, etc.

Randomized quantile residuals

  • If model fits well, quantiles of residuals should be uniformly distributed

  • I.E., for any point, if we had its distribution, there should be no bias in its quantile

  • We do this via simulation

  • Works for many models, and naturally via Bayesian simuation

Randomized quantile residuals: Steps

  1. Get ~1000 (or more) simulations of model coefficients

  2. For each response (y) value, create an empirical distribution from the simuations

  3. For each response, determine it’s quantile from that empirical distribution

  4. The quantiles of all y values should be uniformly distributed
    • QQ plot of a uniform distribution!

Randomized quantile residuals: Visualize

Randomized quantile residuals: Visualize

Randomized quantile residuals: Visualize

Randomized quantile residuals: Visualize

Randomized quantile residuals: Visualize

Quantile Residuals

Quantile Residuals

A Generalized Outline

  1. Why use GLMs? An Intro to Entropy

  2. Logistic Regression Verus Linear Regression

  3. Generalized Linear Models

  4. Poisson Regression (Poisson Error, Long Link)

General Linear Models are a Special Case

Likelihood:
\[\Large Y_i \sim \mathcal{N}(\hat{Y_i},\sigma^{2})\]

Data Generating Process:
- Transformation (Identity Link):
\[\Large \hat{Y}_{i} = \eta_{i} \]
- Linear Equation:
\[\Large \boldsymbol{\eta_{i}} = \boldsymbol{\beta X_i} \] But what if We don’t want a Normal Distribution?

The Generalized Linear Model

Likelihood:
\[\boldsymbol{Y_i} = E(\boldsymbol{\hat{Y_i}}, \theta)\]
E is any distribution from the Exponential Family
\(\theta\) is an error parameter, and can be a function of Y

Data Generating Process:
- Link Function \[\boldsymbol{f(\hat{Y_i})} = \boldsymbol{\eta_i}\]

- Linear Predictor \[\boldsymbol{\eta_i} = \boldsymbol{\beta X}\]

Generalized Linear Models: Error

Basic Premise:

  1. The error distribution is from the exponential family

    • e.g., Normal, Poisson, Binomial, and more.



  2. For these distributions, the variance is a funciton of the fitted value on the curve: \(var(Y_i) = \theta V(\hat{Y_i})\)

    • For a normal distribution, \(var(Y_i) = \theta*1\) as \(V(\hat{Y_i})=1\)

    • For a poisson distribution, \(var(Y_i) = 1*\mu_i\) as \(V(\hat{Y_i})=\hat{Y_i}\)

The Generalized Linear Model

Likelihood:
\[\boldsymbol{Y_i} = E(\boldsymbol{\hat{Y_i}}, \theta)\]
E is any distribution from the Exponential Family
\(\theta\) is an error parameter, and can be a function of Y

Data Generating Process:
- Link Function \[\boldsymbol{f(\hat{Y_i})} = \boldsymbol{\eta_i}\]

- Linear Predictor \[\boldsymbol{\eta_i} = \boldsymbol{\beta X}\]

A Generalized Outline

  1. Why use GLMs? An Intro to Entropy

  2. Logistic Regression Verus Linear Regression

  3. Generalized Linear Models

  4. Poisson Regression (Poisson Error, Long Link)

What is the relationship between kelp holdfast size and number of fronds?

What About Kelp Holdfasts?

How ’bout dem residuals?

What is our data and error generating process?

What is our data and error generating process?

  • Data generating process should be exponential - No values less than 1

  • Error generating process should be Poisson - Count data

What is our data and error generating process?



Kelp GLM Results

LR Test

LR Chisq Df Pr(>Chisq)
HLD_DIAM 456.6136 1 0


Coefficients:

term estimate std.error statistic p.value
(Intercept) 1.778059 0.0572585 31.05319 0
HLD_DIAM 0.023624 0.0010502 22.49521 0

Kelp GLM Results

Kelp GLM Results

Kelp GLM Quantile Residuals

Ruh roh! Overdispersion

  • Sometimes, your variance changes faster than predicted by your error distribution

  • This is called overdispersion

  • We will deal with it more formally next week, but…

  • Sometimes, a different error structure will do.

The Negative Binomial

  • Related to the binomial (coin flips!)

  • Number of failures before size successes are seen given p as the probability of success

  • \(Y_i \sim NB(size, p)\)

  • Variance = \(\hat{Y_i}^2 + \kappa\hat{Y_i}^2\)|$

  • Increases with the square, not linearly

The Negative Binomial

A Negative Binomial GLM

Likelihood:
\[\boldsymbol{Y_i} \sim NB(\boldsymbol{\hat{Y_i}}, \boldsymbol{\theta})\]

Data Generating Process: \[log(\boldsymbol{\hat{Y_i}}) = \boldsymbol{\eta_i}\]
\[\boldsymbol{\eta_i} = \boldsymbol{\beta X_i}\]

A Negative Binomial GLM

Analysis of Deviance Table (Type II tests)

Response: FRONDS
         LR Chisq Df Pr(>Chisq)    
HLD_DIAM   51.145  1  8.578e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kelp NB GLM Results

Kelp NB GLM Checks

You Try: Wolf Inbreeding and Litter Size