STAT2008 / STAT2014 / STAT4038 / STAT6014 / STAT6038

REGRESSION MODELLING
STAT2008/STAT2014/STAT4038/STAT6014/STAT6038
Assignment 2 (Total Marks: 50)

Data Description
The primary objective of the Study on the Eﬃcacy of Nosocomial Infection Control (SENIC
Project) was to determine whether infection surveillance and control programs have reduced
the rates of nosocomial (hospital-acquired) infection in United States hospitals. This data
set consists of a random sample of 113 hospitals selected from the original 338 hospitals
surveyed.
Each line of the data set has an identication number and provides information on 11 other
variables for a single hospital. The data presented here are for the 1975-76 study period.
The 12 variables are:
Variable Name Description
Identication number (ID) 1 − 113
Length of stay Average length of stay of all patients
(LengthStay) in hospital (in days)
Age (Age) Average age of patients (in years)
Infection risk Average estimated probability of acquiring
(InfectionRisk) infection in hospital (in percent)
Routine culturing ratio Ratio of number of cultures performed
(CulturingRatio) to number of patients without signs or
symptoms of hospital-acquired infection, times 100
Routine chest X-ray Ratio of number of X-rays performed
(XRayRatio) to number of patients ratio without signs
or symptoms of pneumonia, times 100

Number of beds Average number of beds in hospital
(NumBeds) during study period
Medical school aﬃliation 1 = Yes, 2 = No
(MedicalSchool)
Region Geographic region, where: 1 = NE, 2 = NC,
(Region) 3 = S, 4 = W
Average daily census Average number of patients in hospital
(DailyCencus) per day during study period

Number of nurses Average number of full-time equivalent registered
(NumNurses) and licensed practical nurses during study period
(number full time plus one half the number part time)
Available facilities and services Percent of 35 potential facilities and services
(FacilityService) that are provided by the hospital
Note variable identication number (ID) will not be used in the following questions. It is
just an observation index.
Question 1 [50 Marks]
(a) [4 marks] Based on the data description, which variables are qualitative variables?
Read the whole dat set into R. Are these qualitative variables shown as factor
objects in R? If not, manually convert them to factor objects. For each qualitative
variable, how many observation does each group have?
(b) [3 marks] For each qualitative variable in part (a), provide boxplots for the length
of stay in hospital (LengthStay) across diﬀerent groups. Compare the group dif-
(c) [7 marks] You believe that the length of stay in hospital (LengthStay) is related
to the average age of patients (Age), infection risk (InfectionRisk) and all the
qualitative variables in part (a). Fit a rst-order regression model by regressing
LengthStay on these predictors. Show the summary table of the tted results. Is
the model signicant? Interpret the estimated coeﬃcients except for the intercept.
(d) [3 marks] Test if all the qualitative variables can be removed from the model in
part (c).

