这是一篇来自英国的关于中间定量分析的作业代写

 

Question 1

You obtain data from a clinical trial of the efficacy of a COVID-19 vaccine. The trial was designed to test whether the vaccine worked for two different groups: (a) those without underlying medical conditions (let’s call this the ‘healthy’ group) and (b) those with underlying medical conditions (let’s call this the ‘unhealthy’ group).

The size of the trial was large, with 220,000 people in the first group (the ‘healthy’ group) and 220,000 people in the second group (the ‘unhealthy’ group). In other words, there were 440,000 people in total.

The first step of the trial was to recruit volunteers (as with all clinical trials, this was done by asking for volunteers, not by random sampling). Once somebody came forward as a volunteer, their health was assessed. If they were unhealthy, they were put into the unhealthy group. If they were healthy, they were put into the healthy group.

The second step of the trial was to randomize people within each health group to one of the two treatment groups. Within each health group, 110,000 people were randomly assigned to the control group (i.e. they got the placebo) and 110,000 people were randomly assigned to the treatment group (i.e. they got the vaccine).

Let us leave health aside for now. You begin by considering the following two variables:

infection  Did the individual contract COVID-19? (0) No, (1) Yes treatment  Was the individual in the control group or the treatment group? (0) Control, (1)Vaccine Infection is the outcome variable (did the individual contract COVID-19 or not?) and treatment is the explanatory variable (did the individual receive the placebo or the vaccine)?

You begin by inspecting the data as a whole. You obtain the following contingency table for these two variables:

1(a)  Using appropriate conditional probabilities (that you must calculate yourself), explain what this contingency table tells you about the association between infection and treatment. Does the vaccine seem to work? [4 Marks]

1(b)  Calculate the odds-ratio that describes the association between infection and treatment. You must show your working. [2 Marks]

A binary logistic model is fitted to these data, with infection as the outcome (dependent) variable and treatment  as the predictor (independent) variable. Output for this model is shown below on the logit scale:

Call:

glm(formula = infection ~ treatment, family = “binomial”, data = trial)

Deviance Residuals:

Min 1Q Median 3Q Max

-0.1331 -0.1331 -0.0369 -0.0369 3.8186

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -4.7221 0.0228 -207.07 <2e-16 ***

treatment -2.5680 0.0848 -30.28 <2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 26531 on 439999 degrees of freedom

Residual deviance: 24706 on 439998 degrees of freedom

AIC: 24710

Number of Fisher Scoring iterations: 10

1(c)  Write out the fitted regression equation. [2 Marks]

1(d) Calculate and interpret the estimated regression coefficient of treatment in terms of an odds ratio (as opposed to interpreting it on the logit scale). You must show your working. [4 Marks]

You next consider the following additional variable:

unhealthy

Was the individual healthy (did not have pre-existing health conditions) or unhealthy (did have pre-existing health conditions)? (0) Healthy, (1) Unhealthy

You begin by testing whether the vaccine works for the unhealthy group. This is important because people with underlying health conditions are at greater risk of dying of COVID-19 than people without underlying health conditions.

You obtain the following contingency table for infection and treatment, where you only include in the analysis those who are in the unhealthy group (i.e. you only include in the analysis those for whom unhealthy=1):

A binary logistic model is fitted to these data, with infection as the outcome variable and treatment as the explanatory variable. Output for this model is shown below on the logit scale.

Call:

glm(formula = infection ~ treatment, family = “binomial”, subset = unhealthy > 0, data = trial)

Deviance Residuals:

Min 1Q Median 3Q Max

-0.1331 -0.1331 -0.0135 -0.0135 4.3141

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -4.72208 0.03225 -146.42 <2e-16 ***

treatment

-4.58348 0.31761 -14.43 <2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 12567 on 219999 degrees of freedom

Residual deviance: 11316 on 219998 degrees of freedom

AIC: 11320

Number of Fisher Scoring iterations: 11

1(e) Recall that you are analyzing data only for those for whom unhealthy=1, i.e. you are excluding those who do not have underlying health conditions (unhealthy=0). What is the null hypothesis being tested here? What do you conclude from the test here, and why? [4 Marks]

You next focus on whether the efficacy of the vaccine is different for the unhealthy group compared to the healthy group. Analysing the full set of data (i.e. bringing back into people who do not have an underlying health condition) you obtain the following contingency table for infection and treatment, broken down by unhealthy.

“Three-way tables of treatment by infection, given health”

$`Three-way`$frequencies

infection 0 1 Sum

unhealthy treatment

Healthy Control 109030 970 110000

Vaccine 109860 140 110000

Sum 218890 1110 220000

Unhealthy Control 109030 970 110000

Vaccine 109990 10 110000

Sum 219020 980 220000

You fit a binary logistic model, with infection as the outcome variable and treatment, unhealthy and the interaction effect for treatment and unhealthy as the explanatory variables. Output for this model is shown below on the logit scale.

Call:

glm(formula = infection ~ treatment + unhealthy + unhealthy_treatment,family = “binomial”, data = trial)

Deviance Residuals:

Min 1Q Median 3Q Max

-0.13309615 -0.13309615 -0.05046856 -0.01348431 4.31408137

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept)

-4.722 3.2250576e-02 -146.41854 < 2e-16 ***

treatment

-1.943 9.0509995e-02 -21.46987 < 2e-16 *** unhealthy

2.846e-12 4.5609201e-02 0.00000 1

unhealthy_treatment

-2.640 3.3025299e-01 -7.99459 1.3e-15 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 26531.4293 on 439999 degrees of freedom

Residual deviance: 24571.4677 on 439996 degrees of freedom

AIC: 24579.4677

Number of Fisher Scoring iterations: 11