\(\widehat{Y} = \beta_0 + \beta_1 X + \epsilon\) where \(\beta_0\) = intercept, \(\beta_1\) = slope
Minimize Residuals defined as \(SS_{residuals} = \sum(Y_{i} - \widehat{Y})^2\)
Variance-Mean Relationship
Different precision for different data points
sq_sum <- lm(MEAN_Testisweight ~ MEAN_DML, data=squid_summary,
weights=1/VAR_DML)
No weights
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -4.5177727 | 2.1081505 | -2.143003 | 0.0445948 |
MEAN_DML | 0.0376229 | 0.0077701 | 4.842035 | 0.0000989 |
Precision Weighted
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -4.518775 | 2.2041869 | -2.050087 | 0.0544130 |
MEAN_DML | 0.037140 | 0.0074578 | 4.980010 | 0.0000831 |
Sample Size Weighted
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -5.2808150 | 2.8213670 | -1.871722 | 0.0759446 |
MEAN_DML | 0.0417104 | 0.0110373 | 3.779050 | 0.0011787 |
library(lmtest)
squid_lm <- lm(Testisweight ~ DML, data=squid)
bptest(squid_lm)
studentized Breusch-Pagan test
data: squid_lm
BP = 155.31, df = 1, p-value < 2.2e-16
squid_lm2 <- lm(Testisweight ~ DML + MONTH, data=squid)
bptest(squid_lm2)
studentized Breusch-Pagan test
data: squid_lm2
BP = 144.22, df = 2, p-value < 2.2e-16
bptest(squid_lm2, varformula = ~ DML, data=squid)
studentized Breusch-Pagan test
data: squid_lm2
BP = 140.74, df = 1, p-value < 2.2e-16
bptest(squid_lm2, varformula = ~ MONTH, data=squid)
studentized Breusch-Pagan test
data: squid_lm2
BP = 0.086946, df = 1, p-value = 0.7681
squid_wls <- lm(Testisweight ~ DML, data=squid,
weights = 1/squid$DML)
Variance increases with DML, so weight decreases
squid_gls <- gls(Testisweight ~ DML,
data=squid,
weights = varFixed(~ DML))
Note: higher value = higher variance - opposite of lm
WLS with lm
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.6239370 0.3382932 -16.624 < 2.2e-16 ***
DML 0.0430654 0.0014061 30.627 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
GLS with gls
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.6239370 0.3382932 -16.624 < 2.2e-16 ***
DML 0.0430654 0.0014061 30.627 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
VarFixed
- Linear continuous varianceVarPower
- Variance increases by a powerVarExp
- Variance exponentiatedVarConstPower
- Variance is constant + powerVarIdent
- Variance differs by groupsVarComb
- Combines different variance functionsLet’s add month! If it was another driver…
squid_gls_month <- gls(Testisweight ~ DML, data=squid,
weights = varComb(varFixed(~DML),
varFixed(~MONTH)))
squid <- mutate(squid, fMONTH = factor(MONTH))
MONTH | var_weight |
---|---|
1 | 8.216093 |
2 | 9.491376 |
3 | 9.456260 |
4 | 6.663934 |
5 | 5.513464 |
6 | 10.884092 |
7 | 6.635753 |
8 | 2.274724 |
9 | 41.502657 |
10 | 46.016877 |
11 | 17.488115 |
12 | 17.734229 |
bptest(squid_month)
studentized Breusch-Pagan test
data: squid_month
BP = 68.375, df = 11, p-value = 2.485e-10
vMonth <- varIdent(form = ~ 1 | fMONTH)
1 | x
form, different variance for different stratasquid_month_gls <- gls(Testisweight ~ fMONTH,
data=squid,
weights=vMonth)
Generalized least squares fit by REML
Model: Testisweight ~ fMONTH
Data: squid
AIC BIC logLik
4307.961 4419.034 -2129.98
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | fMONTH
Parameter estimates:
2 9 12 11 8 10 5
1.0000000 2.0910010 1.3669006 1.3573650 0.4895507 2.2018372 0.7621603
7 6 4 1 3
0.8361282 1.0708259 0.8379144 0.9303731 0.9981328
Coefficients:
Value Std.Error t-value p-value
(Intercept) 10.200311 0.4272917 23.872011 0
fMONTH2 -5.391282 0.6795202 -7.933953 0
fMONTH3 -4.094684 0.5555744 -7.370183 0
fMONTH4 -5.609615 0.5722338 -9.803013 0
fMONTH5 -6.514285 0.5724285 -11.380085 0
fMONTH6 -7.577601 0.6848334 -11.064882 0
fMONTH7 -8.847744 0.6016011 -14.706993 0
fMONTH8 -9.093561 0.4757355 -19.114743 0
fMONTH9 -3.975147 0.7016302 -5.665587 0
fMONTH10 -4.109998 0.7252503 -5.667006 0
fMONTH11 -4.596277 0.6174994 -7.443371 0
fMONTH12 -4.373843 0.7482711 -5.845265 0
Correlation:
(Intr) fMONTH2 fMONTH3 fMONTH4 fMONTH5 fMONTH6 fMONTH7 fMONTH8
fMONTH2 -0.629
fMONTH3 -0.769 0.484
fMONTH4 -0.747 0.470 0.574
fMONTH5 -0.746 0.469 0.574 0.557
fMONTH6 -0.624 0.392 0.480 0.466 0.466
fMONTH7 -0.710 0.447 0.546 0.530 0.530 0.443
fMONTH8 -0.898 0.565 0.691 0.671 0.670 0.560 0.638
fMONTH9 -0.609 0.383 0.468 0.455 0.455 0.380 0.433 0.547
fMONTH10 -0.589 0.370 0.453 0.440 0.440 0.368 0.418 0.529
fMONTH11 -0.692 0.435 0.532 0.517 0.517 0.432 0.491 0.622
fMONTH12 -0.571 0.359 0.439 0.426 0.426 0.356 0.406 0.513
fMONTH9 fMONTH10 fMONTH11
fMONTH2
fMONTH3
fMONTH4
fMONTH5
fMONTH6
fMONTH7
fMONTH8
fMONTH9
fMONTH10 0.359
fMONTH11 0.421 0.408
fMONTH12 0.348 0.336 0.395
Standardized residuals:
Min Q1 Med Q3 Max
-3.5056004 -0.6728254 -0.3252821 0.4762647 4.9030307
Residual standard error: 3.080871
Degrees of freedom: 768 total; 756 residual
squid_lm <- gls(Testisweight ~ fMONTH, data=squid)
anova(squid_lm, squid_month_gls)
Model df AIC BIC logLik Test L.Ratio
squid_lm 1 13 4555.853 4616.017 -2264.926
squid_month_gls 2 24 4307.961 4419.034 -2129.980 1 vs 2 269.892
p-value
squid_lm
squid_month_gls <.0001
read.delim
to read it invarComb