Project Report Specifications:
These problems are deliberately open-ended. In some cases there may be several “correct” answers. Take care
to report on your methods and present your findings in a clear and concise manner, with the goal to inform and
convince me of your results. Include the necessary plots, tests, diagnostics, and model probabilities to illustrate
and support your conclusions. Annotate your results with prose. Plots, code, and output without explanation
in well-organized plain English will not be marked. For each problem be sure to include the following sections
Briefly must outline the problem context, questions of interest, the source or sources of the data, the
highlights of your analysis, and a brief summary of the final conclusions (e.g., forecasting accuracy).
2. Data Analysis (not just computer output)
The data analysis section summarizes how you have applied the methodologies taught in this course to
the data in order to address the basic questions of interest. You need to postulate and to justify your
modeling choices. You must provide a thorough analysis of you data and model, including a discussion
on the need for transformations or not, your choice of modeling approach, assessment of goodness-of
fit/regression diagnostics, etc. You should also compare models corresponding to different hypotheses
about your system and assess the predictive ability of your model. If you choose to use the step
function, you must also build at least one model that corresponds to an a priori hypothesis and explain
how the model corresponds to the hypothesis. You should present computer output and/or appropriate
plots as tables and/or figures if the narrative justifies such inclusions. Points will be deducted if all
necessary plots are at the end of the report not within the narrative.
The conclusions section takes the results of the data analysis section and applies them to the basic ques
tions of interest (the research questions). Typically, this section is less than a page, but there are occasions
that warrant a more thorough discussion.
Students should complete ONE of the following questions.
1 Dengue Incidence
The data for this question consist of time series of dengue case counts for two cities (Iquitos, Peru or San Juan,
Puerto Rico) together with environmental and other covariate information for each location across a number of
transmission seasons. There are two data files, combined iquitos.csv and combined sanjuan.csv,
one for each city. Detailed descriptions of the data are available on the course website and at:
To operate effectively, health departments must be able to predict weekly cases, as this will correspond to
hospital demand and resources. Your task is to provide a fitted model for forecasting weekly total dengue cases
in ONE of the two cities using tools from this class (you may not use additional packages without express
consent from Dr. Johnson). You should divide your data into training and testing sets. Specifically, reserve the
final season as a testing set, and all other seasons for training. Comment on the accuracy of your forecaster
with particular focus on peak incidence – both when it peak occurs and its height – within the training data
set. You may try more than one model and compare their performs. Then use your fitted model(s) to create a
forecast for the first week of the last dengue season, i.e., for the week following the last week in the training
data. Be sure to include uncertainty estimates (e.g., a CI). Finally, describe a method for obtain forecasts (and
uncertainties) for season week 4 of the testing data set (remember you have not yet observed the weather, etc
in the future!). Be sure to include how you would estimate any predictors and quantify the uncertainty in your
Extra Credit: Implement your method to provide estimates and accompanying uncertainties for season week
4 of your testing data. Compare your predictions to the real values. How do you do? Then create predictions
(with uncertainties for every) 4th week within the season, i.e., add predictions for weeks 8, 12, 16, 20, 24, 28,
32, 36, 40, 44, and 48 in the testing set using only data through week 4. What is your prediction for the week
with the most cases. How does it compare to the real peak week?
HINT: Remember, you can only use data from the past to predict the future! This means that if you use an
environmental (or other) covariate to predict cases, you will also have to predict that covariate into the future!
EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!
E-mail: firstname.lastname@example.org 微信:easydue