这个Homework是使用R语言完成线程回归和矩阵设计相关的编码

STAT 3701 Homework 5

Show all work. Submit your solutions in a pdf document on Canvas. Include your R code (which must be

commented and properly indented) in the pdf file. Also submit one text file with all your R code (comments

and all) clearly labeled with the problem it goes with. This must be properly indented. Before every solution

with random sampling use set.seed(3701).

Question 1 (10 points)

Consider the linear regression where we have two explanatory variable {age,treatmentType}, where

age is numerical and treatmentType is categorical with three levels {A, B, C}. The response will be

an exam score. The design matrix will be generated in the same way as we did in Section 2.2 in the notes

Regression Part 2, except for that we don’t have interaction terms in this case. More specifically, we have

• n = 30 subjects.

• The first third received treatment A and, the second treatment B and the last treatment C.

• Age is integer-valued and is uniformly distributed over 18 to 35.

• the true regression coefficient β = (50, 0, 10, 0) so that age is not relevant.

• The random errors are iid N(0, 5

2

).

And the model could be write as

Y = Xβ + ϵ,

here X be a n × 4 design matrix with first column being 1 and second column being observations of age,

third column is the dummy variable for level B of treatmentType and last column is the dummy variable

for level C of treatmentType.

We are interested testing whether age is correlated to the response, i.e., if we let the regression coefficient

of age be β2, we want to test the hypothesis

H0 : β2 = 0

Ha : β2 ̸= 0.

We’ve talked about two ways to conduct this test: the t-test and the F-test. In this question, we are interested

in comparing those two tests.

(a) (5 points) Describe how you will test the hypotheses above using t-test and F-test. You need to write

down the test statistic, the distribution of the test statistic under H0 and how p-value is calculated for

each test.

(b) (5 points) Now we will use simulation to compare those two tests. Set reps = 1000 and significance

level α = 0.05. We will generate reps realizations of data. For each realization, we will test the above

hypotheses using both t-test and F-test and record whether H0 is rejected in the two tests respectively.

In how many realizations, the two tests give different conclusion, i.e., only one of the test reject H0?

What do you conclude on the two tests?

Question 2 (15 points)

We may be interested in testing if linear combinations of the regression coefficients are equal to zero. The

code currently in the notes only accounts for cases when multiple regression coefficients are equal to zero.

For example, we may be interested in the two sided hypothesis test

H0 : β2 + β3 = 0

Ha : β2 + β3 ̸= 0

In this problem you will write code to handle such a hypothesis test.

(a) (5 points) Using the formulas from section 1.2 in the notes Regression Part 2 write a function called

gen.pvals.linear.combination that simulates hypothesis tests and outputs the list of observed p values. Let the errors be distributed N(0, σ2

). The function should take as inputs:

• X, the design matrix

• beta, the true regression coefficients

• sigma, the true standard deviation

• C, the matrix defining the linear combinations

• reps, the number of independent replications

The function should output pval.list a list of realizations of p-values.

(b) (5 points) Generate a design matrix using the generate.X function defined on page 10 of the notes

of Regression Part 2. Use n = 20, mu = 10, σX = 1 and ρ = 0.8. Use your function from part (a) to

simulate p-values for the hypothesis test

H0 : β2 + β3 = 0

Ha : β2 + β3 ̸= 0

Use β = (10, 1, −1, 0), σ = 0.5 and reps = 5000. You C matrix should have one row and four

columns. Use these realizations of p-values to give a 95% score CI for the Type I error probability of

the test when α = 0.05.

(c) (5 points) Using the same design matrix from part b use your function to simulate p-values for the

hypothesis test

H0 : β2 = β3 = β4

Ha : β2, β3, β4 are not all equal

Use β = (10, 1, 1, 0), σ = 0.5 and reps = 5000. You C matrix should have two rows and four

columns. Use these realizations of p-values to give a 95% score CI for the power of the test when

α = 0.05.

·2·

Question 3 (25 points)

In this question, we will compare AIC and BIC under multicollinearity, changing standard deviation of

random errors and changing sample size.

We will use the generate.X function defined on page 10 of the notes of Regression Part 2 to generate

the design matrix X.

(a) (5 points) We know generate.X will return a matrix with first column standing for intercept, second

for X1, third for X2 and last for X3. List out all the eligible subset model.

(b) (6 points) Now let ρ ∈ {0.25, 0.5, 0.98}. Let σX = 1, n = 50, µ = 10. For each ρ, generate

the design matrix with generate.X. Then use reps=1000 realizations of data to estimate (1) the

probability that AIC choose the true model and (2) the probability that BIC choose the true model. In

each realization, use β = (1, 1, 0, 1)′

and use N(0, 4) to generate the random errors. Create a 95%

score CI for those two probabilities.

(c) (7 points) Now let n ∈ {10, 20, . . . , 100}. Let σX = 1, ρ = 0.5, µ = 10. For each n, generate

the design matrix with generate.X. Then use reps=1000 realizations of data to estimate (1) the

probability that AIC choose the true model and (2) the probability that BIC choose the true model.

In each realization, use β = (1, 1, 0, 1)′

and use N(0, 4) to generate the random errors. Create a

95% score CI for those two probabilities. Create a plot of n against the estimated probability for each

information criterion and include the CI in the plot.

(d) (7 points) Now let σ ∈ {1, 1.2, 1.4, . . . 4}. Let σX = 1, ρ = 0.5, n = 50, µ = 10. Generate the

design matrix with generate.X. For each σ, use reps=1000 realizations of data to estimate (1)

the probability that AIC choose the true model and (2) the probability that BIC choose the true model.

In each realization, use β = (1, 1, 0, 1)′

and use N(0, σ2

) to generate the random errors. Create a

95% score CI for those two probabilities. Create a plot of σ against the estimated probability for each

information criterion and include the CI in the plot.

·3·

###### All

#### R语言代写 | P8157 Analysis of Longitudinal Data, Fall 2019

这个作业是用R语言分析酒精对各种疾病造成的影响 P8157 Analy 阅读更多…