本次美国代写是一个R数据建模的Project，需要生成报告

**Project Report Specifications:**

These problems are deliberately open-ended. In some cases there may be several “correct” answers. Take care

to report on your methods and present your findings in a clear and concise manner, with the goal to inform and

convince me of your results. Include the necessary plots, tests, diagnostics, and model probabilities to illustrate

and support your conclusions. Annotate your results with prose. Plots, code, and output without explanation

in well-organized plain English will not be marked. For each problem be sure to include the following sections

**1. Summary/Introduction**

Briefly must outline the problem context, questions of interest, the source or sources of the data, the

highlights of your analysis, and a brief summary of the final conclusions (e.g., forecasting accuracy).

**2. Data Analysis (not just computer output)**

The data analysis section summarizes how you have applied the methodologies taught in this course to

the data in order to address the basic questions of interest. You need to postulate and to justify your

modeling choices. You must provide a thorough analysis of you data and model, including a discussion

on the need for transformations or not, your choice of modeling approach, assessment of goodness-of

fit/regression diagnostics, etc. You should also compare models corresponding to different hypotheses

about your system and assess the predictive ability of your model. If you choose to use the step

function, you must also build at least one model that corresponds to an a priori hypothesis and explain

how the model corresponds to the hypothesis. You should present computer output and/or appropriate

plots as tables and/or figures if the narrative justifies such inclusions. Points will be deducted if all

necessary plots are at the end of the report not within the narrative.

**3. Conclusions**

The conclusions section takes the results of the data analysis section and applies them to the basic ques

tions of interest (the research questions). Typically, this section is less than a page, but there are occasions

that warrant a more thorough discussion.

Students should complete ONE of the following questions.

**1 Dengue Incidence**

The data for this question consist of time series of dengue case counts for two cities (Iquitos, Peru or San Juan,

Puerto Rico) together with environmental and other covariate information for each location across a number of

transmission seasons. There are two data files, combined iquitos.csv and combined sanjuan.csv,

one for each city. Detailed descriptions of the data are available on the course website and at:

http://dengueforecasting.noaa.gov/.

To operate effectively, health departments must be able to predict weekly cases, as this will correspond to

hospital demand and resources. Your task is to provide a fitted model for forecasting weekly total dengue cases

in ONE of the two cities using tools from this class (you may not use additional packages without express

consent from Dr. Johnson). You should divide your data into training and testing sets. Specifically, reserve the

final season as a testing set, and all other seasons for training. Comment on the accuracy of your forecaster

with particular focus on peak incidence – both when it peak occurs and its height – within the training data

set. You may try more than one model and compare their performs. Then use your fitted model(s) to create a

forecast for the first week of the last dengue season, i.e., for the week following the last week in the training

data. Be sure to include uncertainty estimates (e.g., a CI). Finally, describe a method for obtain forecasts (and

uncertainties) for season week 4 of the testing data set (remember you have not yet observed the weather, etc

in the future!). Be sure to include how you would estimate any predictors and quantify the uncertainty in your

prediction.

Extra Credit: Implement your method to provide estimates and accompanying uncertainties for season week

4 of your testing data. Compare your predictions to the real values. How do you do? Then create predictions

(with uncertainties for every) 4th week within the season, i.e., add predictions for weeks 8, 12, 16, 20, 24, 28,

32, 36, 40, 44, and 48 in the testing set using only data through week 4. What is your prediction for the week

with the most cases. How does it compare to the real peak week?

HINT: Remember, you can only use data from the past to predict the future! This means that if you use an

environmental (or other) covariate to predict cases, you will also have to predict that covariate into the future!