本次美国统计代写主要是R语言实现统计模型
Stats413 Homework 10
Due Date: Apr 21, 5pm.
Answer all questions. Show all work.
问题1
a)考虑拟合以下逻辑回归模型:
logit(P(Y = 1 | X))= — 0 + —1X1
在课堂上,我们将X1加1得出了对-ˆ1的解释。
概括一下。也就是说,通过推导X1时Y的相关变化来完成这种解释。
由任意常数c改变:
如果X1改变c,我们可以预测出正面结果的几率将会是。 。 。
b)考虑拟合空逻辑回归模型:
logit(P(Y = 1 | X))= -0
事实证明,对于MLE估计为-0,我们确实有一个封闭形式的解决方案。推导并提供对-ˆ0的解释。
(提示:请谨慎对待信号。qni= 1 yi是什么,它与阳性结果的估计概率(相反,与阴性结果的估计概率)有何关系?exp(-ˆ0)应该有一个不错的选择。和明显的解释。)
问题2
数据集“ chickwts”记录了一项实验,比较了不同饲料对鸡肉重量的影响。让我们反转一下,看看我们是否可以预测收到的特定鸡的饲料。我们将仅关注亚麻籽和向日葵两种提要,下面的代码将帮助您仅使用这两种提要并使用适当的哑变量来创建数据集。
数据(chickwts)
chickenwtsub <-cockwts [chickwts $ feed ==“亚麻仁” | chickenwts $ feed ==“向日葵”,]
chickenwtsub $ sunflower <-as.numeric(chickwtsub $ feed ==“ sunflower”)
拟合一个基于重量预测饲料类型的模型。生成每个观测值的预测概率。 (您不必在提交中包括此输出。)从此输出中,解决以下问题。
a)手动(不在R中)计算至少5个阈值的真实正率和真实负率,这些阈值之间的距离介于0和1之间(不包括0或1,它们为每个模型提供相同的TPR和TNR)。
b)手动(不在R中)使用您的5个阈值以及阈值0和1.绘制ROC曲线。
c)哪个阈值(您选择的5个阈值)提供了最佳分类?
1个
问题3
让我们研究权重的变换如何影响加权最小二乘估计的系数。令-ˆw为权重为W的加权最小二乘估计权重的向量。
d(d)
a)推导-从权重为dW的加权最小二乘得出w,其中d为某个非零常数。表演
d(d)— -w和-w之间的关系。
C)
b)从权重为W + c的加权最小二乘法推导得到-w,其中c为某个非零常数。表演
c(c)ˆ“ w” = 1—w。
问题4
考虑以下情节。 X和Y是连续变量,而G是二进制。
Question 1
- a) Consider fitting the following logistic regression model:
logit(P(Y =1|X))=—0 +—1X1In class we derived an interpretation of —ˆ1 in terms of increasing X1 by 1.
Generalize this. That is, finish this interpretation by deriving the associated change in Y when X1changes by an arbitrary constant c:
If X1 were to change by c, we would predict that the odds of a positive outcome would . . . - b) Consider fitting a null logistic regression model:
logit(P (Y = 1|X)) = —0It turns out that we do have a closed form solution for the MLE estimate of —0. Derive and provide an interpretation for —ˆ0.
(Hints: Be careful with signs. What is qni=1 yi and how does it relate to the estimated probability of a positive outcome (and conversly, to the estimated probability of a negative outcome)? exp(—ˆ0) should have a nice and obvious interpretation.)
Question 2
The data set “chickwts” records an experiment comparing the eect of dierent feeds on chicken weight. Let’s reverse that and see whether we can predict which feed a particular chicken recieved. We’ll focus on only two feeds, linseed and sunflower, and the following code will help you create a data set with only those two feeds and with a proper dummy variable.
data(chickwts)
chickwtsub <- chickwts[chickwts$feed == "linseed" | chickwts$feed == "sunflower",]
chickwtsub$sunflower <- as.numeric(chickwtsub$feed == "sunflower")
Fit a model predicting feed type based upon weight. Generate the predicted probabilities for each observa- tions. (You do not have to include this output in your submission.) From this output, address the questions below.
- a) Manually, (not in R) calculate the true positive rate and true negative rate for at least 5 thresholds well spaced between 0 and 1 (not including 0 or 1 which provide identical TPR and TNR for every model).
- b) Manually (not in R) draw the ROC curve using your 5 thresholds, as well as thresholds 0 and 1.
- c) Which threshold (of the 5 your chose) provides best classification?
1
Question 3
Let’s investigate how transformations of weights aect the estimated coecients in weighted least squares. Let —ˆw be the vector of estimates weights from weighted least squares with weights W.
ˆ(d)
- a) Derive —w from weighted least squares with weights dW where d is some non-zero constant. Show
ˆ(d) ˆ the relationship between —w and —w.
ˆ(c)
- b) Derive —w from weighted least squares with weights W + c where c is some non-zero constant. Show
ˆ(c) ˆ that —w ”= 1—w.
Question 4
Consider the following plot. X and Y are continuous variables, and G is binary.
1 |
|||||||||
3 |
|||||||||
2 |
|||||||||
10
5
0
−5
−2 0 2 4 6
x
G
0 1
- a) Design a model to predict Y . Write down the conditional expectation of Y .
- b) There are three labeled points (the triangles). The color of each matches group membership. For each
point, is it a problematic outlier, an outlier, or neither? Defend your decision.