本次澳洲代写主要为数据分析相关的assignment

资料说明
医院感染控制效力研究的主要目标
项目)是为了确定感染的监视和控制程序是否已减少
美国医院的医院(医院获得性)感染率。这个数据
这组样本是从最初338家医院中选出的113家医院的随机样本组成的
调查。
数据集的每一行都有一个标识号,并提供有关其他11条信息
单个医院的变量。此处提供的数据适用于1975-76年研究期。
12个变量是:
变量名称说明
识别号(ID)1-113
住院时间所有患者的平均住院时间
(LengthStay)在医院(天)
年龄(年龄)患者的平均年龄(以年为单位)
感染风险平均估计获得可能性
(InfectionRisk)医院感染(百分比)
常规培养比例进行培养的数量的比例
(CulturingRatio)指无体征或
医院获得性感染的症状,乘以100
常规胸部X射线检查所执行的X射线检查数量的比率
(XRayRatio)与无迹象的患者人数比例
或肺炎的症状,乘以100
床位数医院平均床位数
(NumBeds)在研究期间
医学院联盟1 =是,2 =否
(医学院)
区域地理区域,其中:1 = NE,2 = NC,
(区域)3 = S,4 = W

每日平均人口普查平均住院人数
(DailyCencus)在研究期间的每天

护士人数平均注册全职同等人数
(NumNurses)和执业执业护士在学习期间
(全职人数加兼职人数的一半)
可用的设施和服务35种潜在设施和服务的百分比
医院提供的(设施服务)
注意在以下问题中将不使用变量识别号(ID)。这是
只是一个观察指标。

问题1 [50分]
(a)[4分]根据数据描述,哪些变量是定性变量?
将整个数据集读入R。这些定性变量是否显示为因子
R中的对象?如果不是,请手动将其转换为因子对象。对于每个定性
变量,每组有多少观察?
(b)[3分]对于(a)部分中的每个定性变量,请提供长度的箱形图
不同组别的住院时间(LengthStay)。比较组差异
引用并总结您的发现。
(c)[7分]您认为住院时间(LengthStay)与年龄有关
到患者的平均年龄(Age),感染风险(InfectionRisk)和所有
(a)部分中的定性变量。通过回归拟合一阶回归模型
停留在这些预测变量上。显示测试结果汇总表。是
模型签名不能?解释除截距外的估计系数。
(d)[3分]测试是否可以从模型中删除所有定性变量
(c)部分。
(e)[3分]建议您将平均年龄之间的相互作用包括在内。
(c)部分的模型中的患者(年龄)和感染风险(InfectionRisk)。如何
您如何看待这个建议?进行适当的分析以解释您的
观点。
(f)[2分]在咨询了一些专家之后,您决定进行一阶回归
通过对平均住院时间(LengthStay)进行平均来回归模型
患者年龄(年龄),感染风险(InfectionRisk),常规培养率
(CulturingRatio),常规胸部X光(XRayRatio),床位数(NumBeds),
医学院联盟(MedicalSchool),地理区域(Region)。显示
tted结果汇总表。哪些变量似乎没有意义
从汇总表?

Data Description
The primary objective of the Study on the Efficacy of Nosocomial Infection Control (SENIC
Project) was to determine whether infection surveillance and control programs have reduced
the rates of nosocomial (hospital-acquired) infection in United States hospitals. This data
set consists of a random sample of 113 hospitals selected from the original 338 hospitals
surveyed.
Each line of the data set has an identication number and provides information on 11 other
variables for a single hospital. The data presented here are for the 1975-76 study period.
The 12 variables are:
Variable Name Description
Identication number (ID) 1 − 113
Length of stay Average length of stay of all patients
(LengthStay) in hospital (in days)
Age (Age) Average age of patients (in years)
Infection risk Average estimated probability of acquiring
(InfectionRisk) infection in hospital (in percent)
Routine culturing ratio Ratio of number of cultures performed
(CulturingRatio) to number of patients without signs or
symptoms of hospital-acquired infection, times 100
Routine chest X-ray Ratio of number of X-rays performed
(XRayRatio) to number of patients ratio without signs
or symptoms of pneumonia, times 100
Number of beds Average number of beds in hospital
(NumBeds) during study period
Medical school affiliation 1 = Yes, 2 = No
(MedicalSchool)
Region Geographic region, where: 1 = NE, 2 = NC,
(Region) 3 = S, 4 = W

Average daily census Average number of patients in hospital
(DailyCencus) per day during study period

Number of nurses Average number of full-time equivalent registered
(NumNurses) and licensed practical nurses during study period
(number full time plus one half the number part time)
Available facilities and services Percent of 35 potential facilities and services
(FacilityService) that are provided by the hospital
Note variable identication number (ID) will not be used in the following questions. It is
just an observation index.

Question 1 [50 Marks]
(a) [4 marks] Based on the data description, which variables are qualitative variables?
Read the whole dat set into R. Are these qualitative variables shown as factor
objects in R? If not, manually convert them to factor objects. For each qualitative
variable, how many observation does each group have?
(b) [3 marks] For each qualitative variable in part (a), provide boxplots for the length
of stay in hospital (LengthStay) across different groups. Compare the group dif-
ference and summarize your ndings.
(c) [7 marks] You believe that the length of stay in hospital (LengthStay) is related
to the average age of patients (Age), infection risk (InfectionRisk) and all the
qualitative variables in part (a). Fit a rst-order regression model by regressing
LengthStay on these predictors. Show the summary table of the tted results. Is
the model signicant? Interpret the estimated coefficients except for the intercept.
(d) [3 marks] Test if all the qualitative variables can be removed from the model in
part (c).
(e) [3 marks] You are suggested to include the interaction between the average age of
patients (Age) and infection risk (InfectionRisk) in the model of part (c). How
do you think about this suggestion? Conduct appropriate analysis to explain your
opinion.
(f) [2 marks] After consulting some experts, you decide to t a rst-order regression
model by regressing the length of stay in hospital (LengthStay) on the aver-
age age of patients (Age), infection risk (InfectionRisk), routine culturing ratio
(CulturingRatio), routine chest X-ray (XRayRatio), number of beds (NumBeds),
medical school affiliation (MedicalSchool), geographic region (Region). Show the
summary table of the tted results. Which variables appear to be insignicant
from the summary table?