本次加拿大代写是Math线性代数的一个限时测试

A6 R Exercise

Load the horseshoe crab data, that we already investigated when discussing logistic
regression:

crabs <- read.table(“../Datasets/crabs.txt”,header=TRUE) #use your own path here
attach(crabs)
head(crabs)
## color spine width satell weight
## 1 3 3 28.3 8 3050
## 2 4 3 22.5 0 1550
## 3 2 1 26.0 9 2300
## 4 4 3 24.8 0 2100
## 5 4 3 26.0 4 2600
## 6 3 3 23.8 0 2100

The data file crabs.txt is available on myCourses under Week 9; the description
of the data is given in RClass8-Crabs-DoItYourself. The response satell, the
number of satellites, takes values in the non-negative integers. Instead of turning it
into a binary response as we have done previously, we will now analyze these data
with GLMs for count data.

(a) Divide the predictor width into 8 intervals using the following cut-off points
1cm apart: 23:25; 24:25; ::::; 28:25; 29:25, and bin the response satell using the
latter into 8 categories. For each category, calculate the sample mean and sample
variance of satell. Plot the pairs of the sample mean and sample variance for
each category. What do you observe? What does this indicate for the suitability
of a Poisson GLM?

(b) Analyze these data using a Poisson GLM with the log link: Select the best fitting
model using analysis of deviance at the 5% level, starting with the model including
the three-way interaction between width, spine, color (the latter two treated
as factors). The variable weight is discarded to avoid (near) multicolinearity.

(c) Let mod be the model you selected in part (b). Is there evidence for overdisper-
sion? Test using a suitable statistical test at the 5% level.

(d) If you were to fit a quasi Poisson model to these data, by which factor would the
standard errors be multiplied? Calculate this factor by hand using R. What does
its value indicate for the significance of the predictors in your model (merely a
qualitative statement required here, no calculations)?

(e) Fit a negative binomial model to these data using the same predictors as mod
and the log link. Can you simplify it? Use appropriate statistical tests at the 5%
level. Does the final model change if spine and color are treated as continuous
predictors?

(f) Formulate, in two or three sentences, the main take-away message(s) from parts
(a)–(e) above.