Notice: Undefined index: url in /var/www/html/wp-content/themes/orfeo/functions.php on line 432


STAC51 (Winter 2020): Final Exam

Note: In any question, if you are using R, all R codes and R outputs must be included in your

Crash Day
Before After
23 21
55 28 57

1. The numbers in the Figure above indicate the weather (overcast or not) of 434 locationmatched triplets of days, one day on which a traffic accident took place, and two control days
without an accident (the day before the accident and the day after the accident). This dataset
could be analyzed as a 1:2-matched case-control. The Venn diagram presentation of the data
is rather unconventional. In matched case-control studies the data could alternatively be
presented in 2 × 2-tables

exposed unexposed
case ai bi
control ci di
for each matched set i = 1, . . . , 434. We denote ni = ai + bi + ci + di

(a) [10 Marks] We note that there are six types of location-specific 2 × 2-tables with the
same exposure-case configuration. List these tables (i.e. different combinations of the
numbers ai
, bi
, ci and di) and their counts.

(b) [6 Marks] The null hypothesis assumes that there is no relationship between being a
case and being exposed. Under the null hypothesis the distribution of the cell count ai
conditional on the row and column marginals is hypergeometric. Find E(ai
| ai + ci)
and Var(ai
| ai + ci) under the null.

(c) [6 Marks] Test the null hypothesis of no association between weather and accidents
using the Cochrane-Mantel-Haenszel (CMH) test statistic, given by
i=1 ai −
i=1 E(ai
|ai + ci)
i=1 Var(ai
| ai + ci)
which is asymptotically distributed as χ
2 with one degree of freedom.
Note: χ
0.95(1) = 3.84.

(d) [4 Marks] Recall that for 1:1 matching there exist 4 unique types of CMH tables. For
1:2 matching there exist 6 unique types of tables. If we have a 1:k matched case control
study how many, unique types of tables exist? Here k < ∞.

(e) [10 Marks] Let’s assume we have the following table is a triplet specific contingency
table from a 1:2 matched case control study.
exposed unexposed Total
case a b 1
control c d 2
Total a + c b + d 3
The odds of being exposed in the case groups is θ times of the odds of being exposed in
the control group. Also, let’s assume that P(a = 1) = θΩ
1 + θΩ
and P(c = 1) = Ω
1 + Ω
Show that, P(a = 1 | a + c = 1) = θ
2 + θ
(Hint: The 2 in the denominator comes from 2 controls).

2. For this question you need to use the warpbreaks dataset from the datasets package. That
is you need to run the following code,
## Run this code to get the veteran dataset ##

You can find the details about the dataset by using ‘?warpbreaks’ code. We are interested
in the count of warp breaks per loom (i.e., variable = ‘breaks’) by wool and tension level.

(a) [8 Marks] Execute a Poisson regression to estimate the mean number of breaks by wool
type and tension level.

(b) [8 Marks] Execute a negative binomial regression to estimate the mean number of
breaks by wool type and tension level.

(c) [6 Marks] Compare the models using the AIC values. Interpret the dispersion parameter
of the negative binomial regression. Which model performed better?
3. For this question you have to simulate a dataset.

(a) [5 Marks] Perform the following simulations.

• Generate 500 random values from X1 ∼ Uniform[0, 1], X2 ∼ Uniform[0, 1], X3 ∼
Uniform[0, 1], X4 ∼ Uniform[0, 1], X5 ∼ Uniform[0, 1]
• Generate, f(X) = 4[sin(πx1x2)+ 8(x3−0.5)3+ 1.5×4−x5−0.77]. Here, π = 3.14…..
• Generate Y ∼ Bernoulli 
p(X) = exp(f(X))
1 + exp(f(X))

(b) [10 Marks] Fit a logistic regression where Y is the outcome and X1, X2, …, X5 are the
predictors. Show the coefficients table. Produce the ROC curve. State the AUC value
and interpret.

(c) [10 Marks] Now instead of using the original X1, X2, …, X5 as predictors, transform the
variables in such a way that they resembels the individual terms in f(X). That, is create
new variables from X1, X2, …, X5 in such a way that f(X) is transformed to a linear predictor. Now run a logistic regression using the new variables. Show the coefficients
table. Produce the ROC curve. State the AUC value.
(Hint: You have to create 4 new variables from X1, X2, …, X5)
(d) [7 Marks] Compare your results in (b) and (c): how did your coefficients and AUC
change from (b) to (c)? Explain why you think this happened.

EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

E-mail:  微信:easydue