Spatial Autocorrelation with GLS

Much of our data is spatial

Much of our data is spatial

A note up front

  • Spatial data analysis is… a big topic.

  • Start by learning how to make maps (sf library)

  • Be prepared to dive in. Deep.

To Space and Beyond

  1. How do we assess spatial autocorrelation?

  2. What are the possible patterns of spatial autocorrelation?

  3. How do we assess point spatial data?

  4. How do we assess polygon spatial data?

How correlated is NDVI across space?

Spatial Autocorrelation and Moran’s I

\[I = \frac{N}{W}\frac{\sum\sum w_{ij}(x_i - \bar{x})(x_j - \bar(x))}{\sum(x_i - \bar{x})^2}\]

  • W is a weight matrix (1/pairwise distance) with diagonals of 0
  • I = 1, perfect positive correlation, I = -1, perfect anti-correlation
  • \(E_i = \frac{-1}{N-1}\) and can be tested

Calculating Moran’s I

  1. Make a distance matrix

  2. Convert it to a weight matrix

  3. Run spdep::moran.test

Making a Weight Matrix

Yes, this is a PITA…

  • NB, use projections with true distances (e.g., UTM, not latlong)

Moran’s I


    Moran I test under randomisation

data:  boreal$NDVI  
weights: boreal_w    

Moran I statistic standard deviate = 45.157, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     1.834024e-01     -1.879699e-03      1.683501e-05 

What does this look like?

To Space and Beyond

  1. How do we assess spatial autocorrelation?

  2. What are the possible patterns of spatial autocorrelation?

  3. How do we assess point spatial data?

  4. How do we assess polygon spatial data?

SAR versus CAR

SAR:
Simultaneous Autoregressive Process \(y_i = BX_i + \sum Sy_ij + \epsilon_{i}\)

CAR:
Conditional Autoregressive Process \(y_i | y_{-i} ~\sim \mathcalc{N}{ BX_i + \sum S_{ij}yj, m_{ij}\)

SAR versus CAR

from https://stats.stackexchange.com/a/9983

Non-spatial model
My House Value is a function of my home Gardening Investment.

SAR model
My House Value is a function of the House Values of my neighbours.

CAR model
My House Value is a function of the Gardening Investment of my neighbours.

Variograms

  • (Semi)variograms allow us to look at variance of squared difference between points at different distances

  • Computationally intensive

  • Shape tells us what the correlation structure should be

  • With a fit covariance function, can estimate the whole surface

Our Variogram

Anatomy of a Variogram

Spatial Correlation Matrix

\[ cor(\epsilon) = \begin{pmatrix} 1 & \rho_{ij} &\rho_{ik} & ... \\ \rho_{ji} & 1& \rho_jk & ...\\ \rho_{ki} & \rho_{kj} & 1 & ... \\ . & . & . & ... \\ . & . & . & ... \\ . & . & . & ... \\ \end{pmatrix}\]

\[\rho{ij} = f(x_i, x_j)\] e.g., \[\rho{ij} = exp(-Distance/range)\]

Different Shapes of Autocorrelation

How do you tell the difference?

  1. Biology

  2. Fit all structures and evaluate

  3. Visual examination of variogram shape

What model best describes our variogram?

  model        psill    range
1   Nug 0.0009415914    0.000
2   Sph 0.0072401874 1412.745

See the Fit Variogram

What are the Implications of our Variogram: Krigging

  1. Determine a model of the data (variogram!)

  2. Build a grid covering the space you are interested in

  3. Use your model to predict points between those you have measured

What are the Implications of our Variogram: Krigging

What are the Implications of our Variogram: Krigging

[using ordinary kriging]

This is all very cool, but…




None of this involves biological processes

Model of Where Space Enters In

How do we translate this to correlation?

And…




Once we build a process-based understanding, we can krig even better!

To Space and Beyond

  1. How do we assess spatial autocorrelation?

  2. What are the possible patterns of spatial autocorrelation?

  3. How do we assess point spatial data?

  4. How do we assess polygon spatial data?

To Space and Beyond

  1. How do we assess spatial autocorrelation?

  2. What are the possible patterns of spatial autocorrelation?

  3. How do we assess point spatial data?

  4. How do we assess polygon spatial data?

Analysis of Point Pattern Data

Naieve Analysis of Point Pattern Data

Plot of Residuals

Analysis of Residuals


    Moran I test under randomisation

data:  residuals(boreal_mod)  
weights: boreal_w    

Moran I statistic standard deviate = 30.255, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     1.217804e-01     -1.879699e-03      1.670611e-05 

What’s the Variogram of Residuals?

What model best describes our residuals?

  model        psill    range
1   Nug 0.0009415914    0.000
2   Sph 0.0072401874 1412.745

Compensating with CAR Model

Compensating with CAR Model: glmmTMB

Compensating with CAR Model: gls

Differences?

# A tibble: 2 x 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)    0.327   0.00552      59.3 1.95e-236
2 Wet           -4.88    0.154       -31.6 3.05e-124
# A tibble: 2 x 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    0.402     9.19     0.0437 9.65e- 1
2 Wet           -3.04      0.168  -18.1    1.62e-57

Predictions Across Landscape

  1. Requires predictions of wetness across landscape

  2. So, variogram and krig away!

  3. Then make a new grid.

  4. Use wetness values and spatial values to simulate new landscape with predict and brms or glmmTMB model

To Space and Beyond

  1. How do we assess spatial autocorrelation?

  2. What are the possible patterns of spatial autocorrelation?

  3. How do we assess point spatial data?

  4. How do we assess polygon spatial data?

Analysis of Polygons

We need distances - So let’s Make a Mesh

Analysis of Autocorrelation Says….


    Moran I test under randomisation

data:  residuals(penn_mod)  
weights: penn_weights    

Moran I statistic standard deviate = 0.60877, p-value = 0.2713
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.030083778      -0.015151515       0.005521394 

Really?

Fitting A SAR Model Anyway…

compare

             Estimate Std. Error  t value   Pr(>|t|)
(Intercept)  3.621208   2.135931 1.695377 0.09478954
smoking     18.345216   8.944603 2.050982 0.04430385
             Estimate Std. Error  z value   Pr(>|z|)
(Intercept)  3.715435   2.161831 1.718652 0.08567779
smoking     17.919386   9.048019 1.980476 0.04765004

Predictions?

Example

  1. Irish forest data

  2. Goal is to model lake Ph

  3. Possible influences:
    • Sodium Dominance Index (SDI)
    • Status as forested or not
    • Altitude

  4. Spatial autocorrelation possible