STA303H1S/STA1002HS Final Project

1背景

2数据集

•这是一次住院经历（住院）。
•这是一种糖尿病的遭遇，即在此期间任何一种糖尿病都进入了糖尿病。

•住院时间至少1天，最多14天。
•在相遇期间进行了实验室测试。
•在相遇期间服用药物。

ID存储在可变患者nbr中。
2.1回应

1.不重新录取；
2.不到30天的再次入院（这种情况不好，因为可能是您的治疗

3.超过30天的重新录取（此录取不如最后一次录取好，但是，

3
2.2预测变量/协变量

“许可来源ID”，“付款人代码”，“重新输入”和“遇到人数”。前四个变量是一些标识变量。 “重新提交”是响应，“遇到人数”

Using patient characteristics available from hospital, identify groups of patients who are at different
risk of readmission. To answer this question you can use any statistical technique that you learned
from the course. However, you need to explain your choice. You should focus on the follwoing
aspects:
1. There are many covariates in the dataset. For predicting the probability of a patient being
readmitted please select maximum 9-10 covariates. You need to explain why and how you
choose the 9-10 covariates for prediction.
2. Since this is a prediction problem you should make one test dataset which you will never
use for modelling. Create a test dataset that contains a random selection of 20000 patients.
You should not sample from the encounters. you have randomly choose the patients using
‘patient nbr’ variable. You will find the %in% code in R very useful for this purpose. You
should use your student ID as the seed for the sampling.
3. You can fit a GLMM, GLM or GAM (or any other method). But since this is a longitudinal
dataset you need to explain what assumptions you need to make to fit a GLM or any other
model which assumes independence. If you use GLM then variable selection and prediction
becomes very easy, which is not trivial for GLMM. GLMM is, however, the most appropriate
analysis technique for this data, but due to the large structure of the data GLMMs may take
a long time and may not converge. Thus, you need to properly explain how you choose the
modelling technique and also if you fail to perform certain analyses then state that clearly in
the limitation section.
4. Make sure to perform exploratory data analysis (basic summary statistics, plots etc.) before
moving on to the final modelling.
5. You can do some literature review if that helps.
4

