\[y\sim \mathcal{N}(\mu, \sigma^{2})\]
\[g(\mu) = f(X)\]
\[ f(X) = \sum_{j=1}^{d}\gamma_jB_j(x)\]
\[f(X) = \sum_{j=1}^{d}\gamma_jB_j(x)\]
Linear Regression as a Basis Function:
\[d = 1\]
\[B_j(x) = x\]
So…. \[f(x) = \gamma_j x\]
\[f(X) = \sum_{j=1}^{d}\gamma_jB_j(x)\]
Polynomial Regression as a Basis Function:
\[f(x) = \gamma_0 + \gamma_1\cdot x^1 \ldots +\gamma_d\cdot x^d\]
bs = "tp"
bs = "cr"
bs = "re"
bs = "ps"
bs = "ts"
bs = "ti"
bs = "t2"
>gam.check(fit, k.rep=1000)
Method: GCV Optimizer: magic
Smoothing parameter selection converged after 4 iterations.
The RMS GCV score gradient at convergence was 2.959763e-07 .
The Hessian was positive definite.
Model rank = 10 / 10
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(Temp) 9.00 6.89 0.93 0.001 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Effective DF = DF adjusted for penalty
Family: gaussian
Link function: identity
Formula:
Oxy ~ s(Temp, bs = "cr")
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Temp) 6.888 7.726 1233 <2e-16
Family: gaussian
Link function: identity
Formula:
Oxy ~ s(Temp, bs = "cr")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.646192 0.004348 148.6 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Temp) 6.888 7.726 1233 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.619 Deviance explained = 62%
GCV = 0.11108 Scale est. = 0.11093 n = 5868
# A tibble: 5,868 x 10
`(Intercept)` `s(Temp).1` `s(Temp).2` `s(Temp).3` `s(Temp).4`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 -0.0196 -0.0116 -0.0122 -0.0349
2 1 -0.0196 -0.0116 -0.0122 -0.0349
3 1 -0.165 0.0358 0.847 -0.217
4 1 -0.178 -0.0933 -0.155 -0.148
5 1 -0.130 -0.0651 -0.127 -0.0579
6 1 -0.169 -0.0867 -0.154 -0.113
7 1 -0.169 -0.0871 -0.154 -0.115
8 1 -0.511 -0.264 -0.462 -0.361
9 1 -0.348 -0.181 -0.311 -0.263
10 1 -0.315 -0.164 -0.280 -0.242
# … with 5,858 more rows, and 5 more variables: `s(Temp).5` <dbl>,
# `s(Temp).6` <dbl>, `s(Temp).7` <dbl>, `s(Temp).8` <dbl>,
# `s(Temp).9` <dbl>
bs = "gp"
bs = 'mrf'
: Markov Random Field<ScaleContinuous>
Range:
Limits: 0 -- 1
best_resid_vario <- fit.variogram(v_bor_resid,
model = vgm(c("Gau", "Mat", "Sph")))
best_resid_vario
model psill range
1 Nug 0.0011640961 0.000
2 Sph 0.0005326058 1066.352
Method: GCV Optimizer: magic
Smoothing parameter selection converged after 11 iterations.
The RMS GCV score gradient at convergence was 8.119027e-08 .
The Hessian was positive definite.
Model rank = 51 / 51
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(Wet) 9.00 2.87 1.02 0.64
s(T61) 9.00 8.54 0.89 <2e-16 ***
s(x,y) 32.00 22.83 0.83 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Method: GCV Optimizer: magic
Smoothing parameter selection converged after 13 iterations.
The RMS GCV score gradient at convergence was 1.047426e-07 .
The Hessian was positive definite.
Model rank = 118 / 118
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(Wet) 9.00 2.73 1.00 0.49
s(T61) 9.00 2.45 0.98 0.26
s(x,y) 99.00 70.30 1.01 0.59
model psill range
1 Nug 5.328932e-05 0.000
2 Sph 1.772570e-04 1461.766
model psill range kappa
1 Nug 0.0000000 0.0000 0.0
2 Mat 0.7272499 723.8632 0.5
grid <- crossing(x = seq(min(boreal$x), max(boreal$x), length.out=100),
y = seq(min(boreal$y), max(boreal$y), length.out=100)
) %>%
#predictors
mutate(Wet = predict(wet_mod, newdata = .),
T61 = predict(t_mod, newdata = .)) %>%
#response
mutate(`Predicted NDVI` = predict(boreal_mod_spatial, newdata = .))
# A tibble: 1,071 x 7
county cases population race gender age rate
<fct> <int> <int> <fct> <fct> <fct> <dbl>
1 adams 0 1492 o f Under.40 0
2 adams 0 365 o f 40.59 0
3 adams 1 68 o f 60.69 0.0147
4 adams 0 73 o f 70+ 0
5 adams 0 23351 w f Under.40 0
6 adams 5 12136 w f 40.59 0.000412
7 adams 5 3609 w f 60.69 0.00139
8 adams 18 5411 w f 70+ 0.00333
9 adams 0 1697 o m Under.40 0
10 adams 0 387 o m 40.59 0
# … with 1,061 more rows
We use the aggregated data, as there are 67 counties, not 1072!
penn_sf <- penn_sf %>%
mutate(pred_cases = fitted(gam_mrf)*population,
pred_rate = fitted(gam_mrf))
#aggregate
penn_predict_agg <- penn_sf %>%
group_by(county) %>%
summarise(population = sum(population),
pred_cases = sum(pred_cases),
pred_rate = sum(pred_cases)/population*1000,
observed_rate = sum(cases)/population*1000)
#plot side by side
pred_predict_plot <- ggplot(penn_predict_agg) +
geom_sf(aes(fill = pred_cases)) +
scale_fill_viridis_c(option = "D", guide = guide_colorbar("Cancer Cases")) +
theme_minimal() + theme(legend.position="bottom")
penn_canc_poly + pred_predict_plot
Looks like it’s underpredicting, although pattern is ok -