这个作业是解决线性回归相关的统计问题
Stats413 Homework 4
问题1
令Q为值为1、2和3的分类变量。令Qi代表一个虚拟变量,用于标识
Q = i的观测值。考虑以下模型:
E(Y | Q)=β0+β1Q3
a)提供对β1的解释。 (请注意,这并不是要对估算值进行解释
系数,因此您可以根据斜率和/或截距来确定答案。)
b)该模型对第1组和第2组之间的关系有何假设? (您的答案
不应包含该模型的任何系数。)
问题2
证明线性回归具有尺度不变性的另一种方法是重写回归方程。为了
例如,从模型开始
E(Y | X)=β0+β1X,
如果将X替换为X − c,则可以重写方程式:
β
(C)
0 +β
(C)
1个
(X − c)=β
(C)
0 +β
(C)
1 X-β
(C)
1个
C
=(β
(C)
0 −β
(C)
1个
c)+β
(C)
1 X
=β0+β1X,
这样β
(C)
1 =β1和β
(C)
0 =β0+β
(C)
1个
c =β0+β1c,这是我们在课堂上得出的。
a)使用类似的方法,表明该模型仍然是尺度不变的。
E(Y | X)=β0+β1X+β2X2
b)证明该模型在X中不是比例不变的。
E(Y | X)=β0+β1X2
c)我们在课堂上声称线性回归在所有情况下都是尺度不变的。说明结果如何
b)不违反该主张。 (提示:我们做了什么以显示具有二次项的模型
还属于线性回归?)
Question 3
Load the data “Mroz” from the package carData. We’ll focus on variables inc which represents the household
income excluding the wife’s income, and k5 which is the number of children under 5 in the household.
a) Consider fitting a model predicting inc based upon k5. Without actually fitting any models (you
can explore the data), would you recommend including k5 as a continuous variable or a categorical
variable? Justify your recommendation briefly.
b) Regardless of your answer to a), fit the model predicting inc based upon a categorical k5. Note that
this does not imply that including k5 as categorical is the right approach or the correct answer to part
a).
i) What is the reference category for k5?
ii) Interpret the results to briefly tell the full story regarding all the levels of k5.
iii) Having fit the model, provide evidence that is either for or against including k5 as a categorical
variable. (Your answer here and for part a) may be contradictory – it’s perfectly fine to adjust
your recommendation when you receive new data!)
(Hint: The emmip function from the emmeans package may be very helpful for parts ii) and iii).)
Question 4
a) Consider the model
E(Y |X) = β + βX.
Note that here the intercept and slope are forced to be equivalent. Derive the least squares estimate
of β for this model.
(Hints: Be careful with signs. Your final answer should resemble other least squares estimates of β’s.
You may use any results we have previously derived.)
b) Verify that your estimate of β is unbiased. (Hint: It may make things more clear to simplify the
model.)
Question 5
You are asked to carry out a regression analysis, predicting a respondent’s opinion on Fischer’s Shampoo
(response) which their local newspaper recently carried an advertisement for. The sample size is 2,901
respondents. The predictor variables you have are:
• age – Age (continuous)
• ses – Socio-economic status (Low income, middle income, high income)
• subscriber – Respondent subscribes to their local newspaper (No, Yes)
• primarypurchaser – Response to “I am the primary purchaser of goods in my household” (continuous,
1-5 scale, 1 = strongly disagree, 2 = disagree, 3 = neutral, 4 = agree, 5 = strongly agree)
Your boss tells you that they suspect the relationship between opinion on Fischer’s Shampoo and
primarypurchaser to be quadratic.
a) Design a linear regression model for this data that uses all available predictors and information above.
b) For your model, what are the dimensions of X (the data) and X (the design matrix).
c) Your boss asks you to also include a quadratic relationship between subscriber and the respondent’s
opinion. Either modify your model to include this, or explain why you shouldn’t/can’t do that.
d) What would the predicted value from your model be for a 37-year old middle-class respondent who
agrees that they are the primary purchaser for their household? (Your answer should be a formula
involving predicted coefficients and scalars.)