本次北美统计代写主要是使用R语言完成统计数据分析
Stats413 Homework 9
Due Date: Apr 14, 5pm.
Answer all questions. Show all work.
问题1
a)有两个大小为1◊1.的投影矩阵,它们是什么,投影到什么空间?
b)给出以下数据,使用投影获得Yˆ和e。
Q1R Q0R Y = a4b,X = a≠1b
71
假设我们用截距拟合模型。您必须手动解决这些问题,但是可以使用R仔细检查您的工作。
i)此数据的投影矩阵是什么? ))用投影矩阵求出Yˆ。
iii)使用投影矩阵获得e。
问题2
假设Q和R是从X派生的类中定义的。
a)根据Q和R矩阵推导var(-ˆ | X)的表达式。 (提示:最终表格不是
涉及Q或R。)
b)在演讲中,声称H = Q1QT1,(I≠H)= Q2QT2。证明这是真的。 (提示:H很简单。对于(I≠H),请回想一下,如果QQT = I,则也意味着I = QQT。请注意总和。)
c)给出以下Y,Q和R矩阵,手动估计-。 (请注意,这些是Q和R,而不是Q1和R1。)如果需要,可以在R中进行矩阵乘法。如果确实提供您使用的代码(而不是Rmarkdown输出)。当然,您也可以手动执行此操作。)
Q5R Q3 1≠4 3≠3R Q5≠2 3R
c 2 d c≠2 0 2≠5 1 d c0 4≠2d Y = c≠3d,Q = c≠1 3≠1 0≠1d,R = c0 0 2 d a2b a≠6 4 6 7 2b a0 0 0b
4 2≠2≠1≠1 4 0 0 0
问题3
讲师过渡到仅在线课程时面临的主要挑战之一是与从事技术课程的技术人员打交道。考虑以下变量和关于第一次在线讲座是否具有技术难度的概率。
C:0 =第一次在线讲座没有技术问题。 1 =第一次在线讲座存在技术问题1
答:指导老师的年龄
定义以下概率:P(C = 1)= .10
P(C = 1 | A = 25)=。02
P(C = 1 | A = 50)=。20
P(C = 1 | A = 75)=。40
计算以下数量:
a)演讲中出现技术问题的几率。
b)演讲的可能性不存在技术问题。
c)对于50岁的讲师来说,技术上有问题的演讲的几率。
d)75岁和25岁之间的班级出现技术问题的几率是多少?
问题4
令Y为连续变量。令X为具有值1、2、3和4的序数变量。令Xi为虚拟变量,如果X = i,则为1。考虑两个不同的模型。
E(Y | X)= — 0 + —1X1 + —2X2 + —3X3 E(Y | X)= –0 + –1X
a)描述一个首选第一个模型的研究问题。
b)描述一个首选第二种模型的研究问题。
c)不想在X和Y之间采用线性关系。数据的哪些特征可能会迫使您使用第二种模型?
d)可以确定哪种型号的RMSE较低吗?如果是这样的话。如果没有,为什么不呢?
e)可以确定哪种型号的BIC较低?如果是这样的话。如果没有,为什么不呢?
f)假设事实是X和Y之间存在很强的负线性关系。您期望最大的ˆ1,ˆ2和ˆ3中的哪个?您希望哪一个最小? (您可能会认为该模型非常适合并捕获了真实的关系。)
2个
Stats413家庭作业9个Rmd问题
截止日期:4月14日,下午5点。
回答所有问题。显示所有工作。
问题5
遥远软件包中的数据集“ jsp”包含有关伦敦小学学生的测试信息。乌鸦变量是智力测试的结果;分数等于或大于33表示该学生在学生中排名前10%。
a)根据性别,年份和社会性别拟合回归预测乌鸦的回归模型(查看jsp的帮助以确保这些变量正确输入模型)。对性别系数提供正式解释,并评论其是否具有统计学意义。
b)生成一个二进制变量,指示学生是否在前10%中。
jsp $ toppercentile <-jsp $ raven> = 33
根据与a)中相同的预测因子进行逻辑回归拟合,以预测该二元变量。对性别系数提供正式解释,并评论其是否具有统计学意义。
c)两种模型中性别差异系数的统计显着性。在问题中解释这意味着什么。具体来说,解决两个模型目标之间的分歧。
提交您的RMarkdown的输出。您的输出不应超过2页。
Question 1
- a) There are two projection matrices of size 1 ◊ 1. What are they, and what space to do they project to?
- b) Given the following data, use projections to obtain Yˆ and e. Q1R Q0R Y=a4b, X=a≠1b
71
Assume we fit the model with an intercept. You must this solve these manually, but can use R to double check your work.
i) What is the projection matrix for this data? ii) Use the projection matrix to obtain Yˆ.
iii) Use the projection matrix to obtain e.
Question 2
Assume Q and R are as defined in class, derived from X.
- a) Derive an expression for var(—ˆ|X) in terms of the Q and R matrices. (Hint: The final form doesn’t involve Q or R.)
- b) In lecture, it is claimed that H = Q1QT1 , and (I ≠ H) = Q2QT2 . Show that this is true. (Hint: H is straightforward. For (I ≠ H), recall that if QQT = I, then it also implies that I = QQT . Note the summations carefully.)
- c) Given the following Y , Q and R matrices, manually estimate —. (Note that these are Q and R, not Q1 and R1.) You may do the matrix multiplication in R if you want; if you do give the code (not the Rmarkdown output) that you used. You may of course do it manually as well.) Q5R Q3 1 ≠4 3 ≠3R Q5 ≠2 3R
c 2 d c≠2 0 2 ≠5 1 d c0 4 ≠2d Y =c≠3d, Q=c≠1 3 ≠1 0 ≠1d, R=c0 0 2 d a2b a≠6 4 6 7 2b a0 0 0b
4 2 ≠2 ≠1 ≠1 4 0 0 0
Question 3
One of the major challenges instructors faced upon transitioning to online-only courses was dealing with technical diculties in holding online lectures. Consider the following variables and probabilities about whether the first online lecture has technical diculties.
C: 0 = First online lecture has no techincal problems. 1 = First online lecture has a technical problem 1
A: Age of instructor
Define the following probabilities: P (C = 1) = .10
P(C =1|A=25)=.02
P(C =1|A=50)=.20
P(C =1|A=75)=.40
Compute the following quantities:
- a) Odds of a lecture having techincal problems.
- b) Odds of a lecture not having technical problems.
- c) Odds of a lecture having technical problems for 50-year old instructors.
- d) How much higher or lower are the odds of class having a techincal problem amongst 75 year olds versus 25 year olds?
Question 4
Let Y be a continuous variable. Let X be a ordinal variable taking on values 1, 2, 3, and 4. Let Xi be a dummy variable which is 1 if X = i. Consider two dierent models.
E(Y|X)=—0 +—1X1 +—2X2 +—3X3 E(Y |X) = –0 + –1X
- a) Describe a research question where the first model would be preferred.
- b) Describe a research question where the second model would be preferred.
- c) You don’t want to assume a linear relationship between X and Y . What characteristic of the data might force your hand into using the second model?
- d) Can you determine which model would have a lower RMSE? If so, which one. If not, why not?
- e) Can you determine which model would have a lower BIC? If so, which one. If not, why not?
- f) Assume the truth is a strong negative linear relationship between X and Y . Which of —ˆ1, —ˆ2, and —ˆ3 would you expect to be largest? Which would you expect to be smallest? (You may assume the model fits well and captures the true relationship.)
2
Stats413 Homework 9 Rmd Questions
Due Date: Apr 14, 5pm.
Answer all questions. Show all work.
Question 5
The data set “jsp” in the package faraway contains testing information on students in London elementary schools. The raven variable is the results of an intelligence test; a score of 33 or above indicates a student scored in the top 10% of students.
a) Fit a regression model predicting raven based upon gender, year and social (look at the help for jsp to make sure these variables enter the model correctly). Provide a formal interpretation of the coecient on gender, and comment upon whether it is statistically significant.
b) Generate a binary variable indicating whether a student was in the top 10%.
jsp$toppercentile <- jsp$raven >= 33
Fit a logistic regression predicting this binary variable based upon the same predictors as in a). Provide a formal interpretation of the coecient on gender, and comment upon whether it is statistically significant.
c) The statistical significance of the coecient on gender diers in the two models. Explain, in context of the problem, what this means. Specifically, address the dirence between the two models’ goals.
Submit the output of your RMarkdown. Your output should be no longer than 2 pages.
1