这是一篇美国的Statistical Learning R代码报告代写

 

Possible Topics of Your Project

The objective of a class project is to help you gain experience with research, and to relate what you learn to real life problems which may require you learn new techniques (or develop new methods by yourself).You are expected to submit a summary report at the end of the semester. Below are two types of possible projects (you only need to choose one of them).

  1. Solving a real life problem. A typical report includes problem formulation, data analysis, proposed solutions, and interpretation of results. The data set can be from your own research or the public domain.
  1. Numerical study of statistical methods/models using existing data sets in the literature. Ideally your approach is substantially difffferent from those in the literature, but it will be all right if you repeat the analysis as long as you did independently. Some examples are
  • Compare performance of competitive statistical (or data mining) techniques;
  • Ask difffferent questions or investigate new ideas of statistical methods or models;
  • Identify optimal parameters of speci c statistical methods or models.

Note that the crucial aspect of your project is to analyze some data sets, not using some specifific statistical methods or models we discussed in class.

Datasets: You can collect the data by yourself, use the data set from your own research or the public domain. The followings are some examples of online datasets (you can use google or other search engineer to fifind more):

  1. http://www.quandl.com/ (fifinancial and economic time-series datasets)
  2. https://datamarket.com/topic/list/ (a privately held Icelandic company that specializes in providing access to data from public, and, to a lesser extent, private institutions and companies.)
  1. http://kdd.ics.uci.edu/ or http://archive.ics.uci.edu/ml/

One example is the KDD cup 1999 data at http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html More KDD cup data can be found at http://www.sigkdd.org/kddcup/index.php

  1. http://lib.stat.cmu.edu/DASL/
  2. http://www.kdnuggets.com/datasets/index.html (links to more data repositories.)
  3. Data from literature, such as journals Journal of American Statistical Association, Journal of Royal Statistical Society: Series B, Statistica Sinica, Annals of Applied Statistics, Journal of Computational and Graphical Statistics, etc.

To inspire your projects, some concrete examples can be as follows:

  • analyze some data sets in some competitions, see the links http://www.kaggle.com/competitions
  • model data from some government websites such as http://www.cdc.gov/biosense/correlate/ or http://www.ngdc.noaa.gov/stp/satellite/goes/dataaccess.html
  • know aspect of Chicago through Chicago Data Portal https://data.cityofchicago.org/

Guidelines on the Final Report

In your final summary report, we expect clear explanations of models chosen, hypotheses tested, and fifindings analogous to what you would produce for a consulting project. The most important advice is to follow your common senses to make your fifinal report understandable to an intelligent scientist who might not be familiar with your project.

The main body of your fifinal summary report (e.g., without appendix and fifigures/tables) is generally 5 10 pages, and the total length of the fifinal report shall not be longer than 20 pages. Only very relevant plots and tables shall be included in the body of the report, and the rest should go to Appendix.When writing up your summary report, it is useful to ask yourself the following questions: What is the work? Why is it important? What background is needed? How will the work be presented?Here is a suggested format for your summary report:

  1. Title Page: Project Title, author(s) (names, and email addresses), the submission date, course name/number. You must add percentage of contribution for each author to the project. The percentage numbers must be agreed upon by all group members.
  1. Abstract: informative summary of the whole report (100-500 words).
  2. Introduction includes problem description, motivation and challenge(s), problem solving strategies,accomplished learning from the applications and outline of the report.
  1. Problem Statement or Data Sources: cite the data sources, and provide a simple presentation of data to help readers understand the problem or challenge(s).
  1. Proposed Methodology: explain (and justify) your proposed methods or models.
  2. Analysis and Results: present key fifindings when executing the proposed methods or models. For the benefifit of readability, detailed results should be placed in the Appendix. Reference of computer softwares to implement your proposed methods or models (even it is a web page) should be given.
  1. Conclusions: Draw conclusions from your data analysis practice. Unfifinished or possible future work could be included (with proper explanation or justifification).
  1. Appendix: This section only includes needed documents to support the presentation in the report.Feel free to divide it into several subsections if necessary. Do NOT dump all computer outputs unorganized here.
  1. Bibliography and Credits.

Please make your writing clear. The technical presentation must be organized. No codes should be included in any parts of the report. Codes should be submitted as separate fifiles from the report.