这个Final Project是使用R语言对API获取的数据进行分析处理

QMSS G5072 Final Project
Goals of the Project
The goal of the final project for the course is to make use of some of the technical
abilities you acquired throughout the course. These may include the following:
Data Acquisition
Import data from different file formats.
Use APIs to obtain data.
Write an API client for an API and/or functions associated with API interaction.
Handle, parse, and transform JSON and XML.
Web scrape data from public websites.
Use SQL queries to obtain data.
Data Cleaning, Transformation, and Organization
Raw Blame History
2019/11/5 course_materials/final_project.md at master · QMSS-G5072-2019/course_materials
https://github.com/QMSS-G5072-2019/course_materials/blob/master/Exercises/final_project/final_project.md 2/5
Use data wrangling (including the tidyverse ) to transform your data into a
dataset or R object ready for analysis.
Use loops and other iterative processes.
Use functions and functional programming to export repetitive or difficult tasks.
Handle and process strings and use regular expressions.
Documentation and Presentation
Provide and document useful functions and/or data as part of an R package.
Make your package and associated material available on Github.
Use R Markdown to generate documentation, vignettes, and write-ups.
Of course, you are welcome to go beyond the tools offered in the course as well.
However, the focus of the project (and grading) should be on the tools conveyed in
this course.
Options
There are two options to choose from for the final project. Please choose one option
and make sure to indicate your choice on the proposal and final project submission.
Option A: Data Project
The focus is on the acquisition, cleaning, transformation, organization, and
presentation of data.
A1. Substantial Data Collection (General)
The first type of data project would be the collection of a substantial amount of data
from at least two sources and combining them into an overall data set. Here the focus
would be on the technical aspects of the collection (API, web scraping, SQL), data
wrangling, cleaning, organization, presentation etc. of data.
A2. Substantial Data Collection (Text)
For students dealing with potentially messy data (e.g. social media text data), I am
prepared to allow using a single data source. The additional part of cleaning and
transforming the data is then replacing the effort of obtaining and merging multiple
data sources.
Presentation
For either data project, the evaluation will not be mainly based on the amount of data
collected but rather on the technical difficulties involved and the breadth of skills
displayed.
2019/11/5 course_materials/final_project.md at master · QMSS-G5072-2019/course_materials
https://github.com/QMSS-G5072-2019/course_materials/blob/master/Exercises/final_project/final_project.md 3/5
Some amount of summary statistics to provide an overview of the data should be
included. Graphical visualizations are welcome but since the course does not teach
visualization techniques, nothing beyond simple graphics is required.
Output
The output includes the data and the code used to obtain the data. Both should be
wrapped in a R package (see below). To reiterate, the package needs to include the
code with which you collected the data rather than just the final dataset and analysis
code. Make the coding documentation clear enough so that someone else could
replicate your effort and re-collect the data. Consider including a brief write-up on
the collection as well to ease possible replication efforts.
Option B: Functional / API Project
The second type of project will not focus on acquiring the data per se, but rather the
methods to acquire data and the functions associated with it.
This project will entail writing multiple functions for an API that currently has no R
package associated with it and packaging it into an API client R package (with the
potential for public release).
Presentation
For the API project, the evaluation will be based on technical difficulties involved, the
breadth of skills displayed, and the variety of functions and options included for a
user of the API.
A significant part of the package, beyond the documented code and functions
themselves, is the use of vignettes. Using one or more vignettes, you should aim to
show all functionalities of the package to a user of your package.
Output
The output for evaluation will be an API client wrapped in a R package. The focus here
lies on the quality, usability, and documentation of the functions provided in the
package.
The R Package
For both options, the documentation of the work will be in the form of a R package.
There are lots of good examples for guidance on CRAN, e.g.
Data R Packages
HistData
2019/11/5 course_materials/final_project.md at master · QMSS-G5072-2019/course_materials
https://github.com/QMSS-G5072-2019/course_materials/blob/master/Exercises/final_project/final_project.md 4/5
USAboundaries
nasaweather
acs
API Client R Packages
rnoaa
WikipediR
ZillowR
rtweet
I highly recommend to follow the advice and guidelines presented in Hadley
Wickham’s book R Packages.
The following parts should be included
All functions, data, and the package itself need to be documented and exported.
A license is specified.
Your package should pass check() without errors (warnings and notes are OK,
though it would be great if there were none; try to address the issues pointed out
by check()).
A readme file.
The data (if data package).
One or more vignettes to describe and discuss the data and/or functions.
Choice of Topic and Proposal
You are free to choose the type of project (see above) and which kind of data or API
to use. To make sure you are on the right track, on November 29 (at the latest; feel
free to submit earlier and receive feedback) we ask you to submit a proposal.
The proposal should include the following information:
– Name of project
– Type of project: Data (A1/A2) or API Client (B)
– Brief description of the purpose
– Links to data sources / API etc. – Outline the technical steps / challenges you plan to
address and include in your submission.
– Are there any significant hurdles that you have doubts about? Would not solving
them render the project incomplete?
Bonus
2019/11/5 course_materials/final_project.md at master · QMSS-G5072-2019/course_materials
https://github.com/QMSS-G5072-2019/course_materials/blob/master/Exercises/final_project/final_project.md 5/5
With your own future in mind, it may be a good idea to show off your skills. For a
bonus part, consider publishing a website for your completed package. See Hadley
Wickham’s pkgdown for instructions on how to do that.
http://hadley.github.io/pkgdown/
Submission
Please follow the instructions to submit your project. For the final project, please use
your submission issue to tag the TAs and the instructor (using ‘@tbrambor’). On
GitHub please use a folder entitled Final_Project to submit all materials. In
addition, you are also welcome to use a personal folder on your own Github account
to publish your package (and allow installation directly via the
devtools::install_github() command).
The proposal is due on Friday, November 29 at 5pm. The final project is due on
December 13 at 5pm.