\[Y_i = \beta X_i + \epsilon_i\] \[\epsilon_i \sim \mathcal{N}(0, \sigma^2_i)\]
\[ \epsilon_i \sim N(0, \large \sigma^2)\]
if n=3…
\[ \sigma^2 = \begin{pmatrix}
\sigma^2 & 0 &0 \\
0 & \sigma^2& 0\\
0 & 0 & \sigma^2
\end{pmatrix} \]
\[ \sigma^2 = \begin{pmatrix}
\sigma^2 & 0 &0 \\
0 & \sigma^2& 0\\
0 & 0 & \sigma^2
\end{pmatrix} \]
-We weight by 1/SD of a response variable when we know something about measurement precision. - E.g. in R lm(y ~ x,weights=1/sd(y))
- Other options include modeling \(\sigma^2\) explicitly as a response. - varFixed
, varIdent
, etc. with gls
\[ \sigma^2 = \begin{pmatrix} \sigma^2 & 0 &0 \\ 0 & \sigma^2& 0\\ 0 & 0 & \sigma^2 \end{pmatrix} \]
Data from Zuur et al. 2009
Note: Time could have had a nonlinear effect, but we’ll start simple
BUT - Need to examine cor(\(X, X_t-\tau\)) to be certain.
Correlation (\(\rho\)) Between T and T-1 = 0.699
birds_lm <- gls(Birds ~ Year, data=oahu_data)
acf(residuals(birds_lm))
\[ cor(\epsilon) = \begin{pmatrix}
1 & \rho &\rho \\
\rho & 1& \rho\\
\rho & \rho & 1
\end{pmatrix} \]
Compound Symmetric Structure: All points are equally correlated
\[ \epsilon_{t} = \rho \epsilon_{t-1} + \zeta_{t}\]
which produces
\[ cor(\epsilon) = \begin{pmatrix}
1 & \rho &\rho^{2} \\
\rho & 1& \rho\\
\rho^{2} & \rho & 1
\end{pmatrix}\]
for n=3 time steps
- Dropoff of correlation with time
#Correlation Structure
birds_corAR <- corAR1 (form = ~ Year)
#Model
birds_ar1 <- gls(Birds ~ Year, data=oahu_data,
correlation = birds_corAR)
Generalized least squares fit by REML
Model: Birds ~ Year
Data: oahu_data
Log-restricted-likelihood: -285.7658
Coefficients:
(Intercept) Year
-14947.690329 7.763806
Correlation Structure: ARMA(1,0)
Formula: ~Year
Parameter estimate(s):
Phi1
0.7356643
Degrees of freedom: 43 total; 41 residual
Residual standard error: 332.2233
Model df AIC BIC logLik Test L.Ratio p-value
birds_ar1 1 4 579.5315 586.3858 -285.7658
birds_lm 2 3 599.5628 604.7035 -296.7814 1 vs 2 22.03122 <.0001
Without correction
Value | Std.Error | t-value | p-value | |
---|---|---|---|---|
(Intercept) | -18806.203 | 6472.241 | -2.906 | 0.006 |
Year | 9.693 | 3.268 | 2.966 | 0.005 |
With correction
Value | Std.Error | t-value | p-value | |
---|---|---|---|---|
(Intercept) | -14947.690 | 14943.052 | -1.000 | 0.323 |
Year | 7.764 | 7.549 | 1.029 | 0.310 |
birds_corCAR <- corCAR1(form = ~ Year)
corAR1
assumes 1 step between each value of covariateTwo Year AR Correlation
arma_birds_2_ar <- corARMA(form = ~ Year, p = 2, q=0)
Two Year MA
arma_birds_2_ma <- corARMA(form = ~ Year, p = 0, q=2)
Two Year ARMA
arma_birds_2_arma <- corARMA(form = ~ Year, p = 2, q=2)
Model df AIC BIC logLik Test L.Ratio p-value
birds_ar2 1 5 581.3772 589.9451 -285.6886
birds_arma2 2 7 585.3404 597.3354 -285.6702 1 vs 2 0.03681297 0.9818
Model df AIC BIC logLik Test L.Ratio p-value
birds_ma2 1 5 584.6792 593.2470 -287.3396
birds_arma2 2 7 585.3404 597.3354 -285.6702 1 vs 2 3.338772 0.1884
Model df AIC BIC logLik
birds_ma2 1 5 584.6792 593.2470 -287.3396
birds_ar2 2 5 581.3772 589.9451 -285.6886
anova(birds_ar1, birds_ar2)
Model df AIC BIC logLik Test L.Ratio p-value
birds_ar1 1 4 579.5315 586.3858 -285.7658
birds_ar2 2 5 581.3772 589.9451 -285.6886 1 vs 2 0.1543411 0.6944
pacf(residuals(birds_lm))
birds_rain <- gls(Birds ~ Rainfall+Year, data=oahu_data)
acf(residuals(birds_rain))
Still autocorrleation
birds_rain_ar <- gls(Birds ~ Rainfall+Year,
data=oahu_data,
correlation = birds_corAR)
No Correlation
Denom. DF: 40
numDF F-value p-value
(Intercept) 1 8.132235 0.0068
Rainfall 1 1.813967 0.1856
Year 1 8.623939 0.0055
Correlation
Denom. DF: 40
numDF F-value p-value
(Intercept) 1 1.0344877 0.3152
Rainfall 1 0.4183154 0.5215
Year 1 1.1009574 0.3004
Each site needs it’s own \(\sigma^2\)
birds_var <- varIdent(form = ~ 1 | Site)
all_rain <- gls(Birds ~ Rainfall*Site + Year,
data=allbirds,
weights = birds_var)
acf(residuals(all_rain))
pacf(residuals(all_rain))
|
to denote ‘varies by’birds_corAR_site <- corAR1(form = ~ Year | Site)
all_rain_cor <- gls(Birds ~ Rainfall*Site + Year,
data=allbirds,
weights = birds_var,
correlation = birds_corAR_site)
Df | Chisq | Pr(>Chisq) | |
---|---|---|---|
Rainfall | 1 | 4.249472 | 0.0392625 |
Site | 4 | 75.580590 | 0.0000000 |
Year | 1 | 19.326067 | 0.0000110 |
Rainfall:Site | 4 | 12.404620 | 0.0145829 |
Model df AIC BIC logLik Test L.Ratio p-value
all_rain 1 16 2676.617 2730.171 -1322.308
all_rain_cor 2 17 2613.632 2670.533 -1289.816 1 vs 2 64.9849 <.0001
allbirds$fit <- predict(all_rain_cor)
allSites +
geom_line(data=allbirds,
mapping=aes(y=fit),
color="red")
\(R^2\) = 1- RSS/TSS = 0
How well can you model the time series with the measurements at hand?
Data extrapolated from Zuur et al. 2009
Suggested Order: