这个Final Project是使用R语言对API获取的数据进行分析处理

QMSS G5072 Final Project
Goals of the Project
The goal of the final project for the course is to make use of some of the technical abilities you acquired throughout the course. These may include the following:
Data Acquisition
Import data from different file formats.
Use APIs to obtain data.
Write an API client for an API and/or functions associated with API interaction.
Handle, parse, and transform JSON and XML.
Web scrape data from public websites.
Use SQL queries to obtain data.
Data Cleaning, Transformation, and Organization
Raw Blame History
2019/11/5 course_materials/final_project.md at master · QMSS-G5072-2019/course_materials
https://github.com/QMSS-G5072-2019/course_materials/blob/master/Exercises/final_project/final_project.md 2/5
Use data wrangling (including the tidyverse ) to transform your data into a dataset or R object ready for analysis.
Use loops and other iterative processes.
Use functions and functional programming to export repetitive or difficult tasks.
Handle and process strings and use regular expressions.
Documentation and Presentation
Provide and document useful functions and/or data as part of an R package.
Make your package and associated material available on Github.
Use R Markdown to generate documentation, vignettes, and write-ups.
Of course, you are welcome to go beyond the tools offered in the course as well.
However, the focus of the project (and grading) should be on the tools conveyed in this course.
Options
There are two options to choose from for the final project. Please choose one option and make sure to indicate your choice on the proposal and final project submission.
Option A: Data Project
The focus is on the acquisition, cleaning, transformation, organization, and presentation of data.
A1. Substantial Data Collection (General)
The first type of data project would be the collection of a substantial amount of data from at least two sources and combining them into an overall data set. Here the focus would be on the technical aspects of the collection (API, web scraping, SQL), data wrangling, cleaning, organization, presentation etc. of data.
A2. Substantial Data Collection (Text)
For students dealing with potentially messy data (e.g. social media text data), I am prepared to allow using a single data source. The additional part of cleaning and transforming the data is then replacing the effort of obtaining and merging multiple data sources.
Presentation
For either data project, the evaluation will not be mainly based on the amount of data collected but rather on the technical difficulties involved and the breadth of skills displayed.
2019/11/5 course_materials/final_project.md at master · QMSS-G5072-2019/course_materials
https://github.com/QMSS-G5072-2019/course_materials/blob/master/Exercises/final_project/final_project.md 3/5
Some amount of summary statistics to provide an overview of the data should be included. Graphical visualizations are welcome but since the course does not teach visualization techniques, nothing beyond simple graphics is required.
Output
The output includes the data and the code used to obtain the data. Both should be wrapped in a R package (see below). To reiterate, the package needs to include the code with which you collected the data rather than just the final dataset and analysis code. Make the coding documentation clear enough so that someone else could replicate your effort and re-collect the data. Consider including a brief write-up on the collection as well to ease possible replication efforts.
Option B: Functional / API Project
The second type of project will not focus on acquiring the data per se, but rather the methods to acquire data and the functions associated with it.
This project will entail writing multiple functions for an API that currently has no R package associated with it and packaging it into an API client R package (with the potential for public release).
Presentation
For the API project, the evaluation will be based on technical difficulties involved, the breadth of skills displayed, and the variety of functions and options included for a
user of the API.
A significant part of the package, beyond the documented code and functions themselves, is the use of vignettes. Using one or more vignettes, you should aim to show all functionalities of the package to a user of your package.
Output
The output for evaluation will be an API client wrapped in a R package. The focus here lies on the quality, usability, and documentation of the functions provided in the package.
The R Package
For both options, the documentation of the work will be in the form of a R package.
There are lots of good examples for guidance on CRAN, e.g.
Data R Packages
HistData
2019/11/5 course_materials/final_project.md at master · QMSS-G5072-2019/course_materials
https://github.com/QMSS-G5072-2019/course_materials/blob/master/Exercises/final_project/final_project.md 4/5
USAboundaries
nasaweather
acs
API Client R Packages
rnoaa
WikipediR
ZillowR
rtweet
I highly recommend to follow the advice and guidelines presented in Hadley Wickham’s book R Packages.
The following parts should be included
All functions, data, and the package itself need to be documented and exported.
A license is specified.
Your package should pass check() without errors (warnings and notes are OK,though it would be great if there were none; try to address the issues pointed out by check()).
A readme file.
The data (if data package).
One or more vignettes to describe and discuss the data and/or functions.
Choice of Topic and Proposal
You are free to choose the type of project (see above) and which kind of data or API to use. To make sure you are on the right track, on November 29 (at the latest; feel free to submit earlier and receive feedback) we ask you to submit a proposal.
The proposal should include the following information:
– Name of project
– Type of project: Data (A1/A2) or API Client (B)
– Brief description of the purpose
– Links to data sources / API etc. – Outline the technical steps / challenges you plan to address and include in your submission.
– Are there any significant hurdles that you have doubts about? Would not solving them render the project incomplete?
Bonus
2019/11/5 course_materials/final_project.md at master · QMSS-G5072-2019/course_materials
https://github.com/QMSS-G5072-2019/course_materials/blob/master/Exercises/final_project/final_project.md 5/5
With your own future in mind, it may be a good idea to show off your skills. For a bonus part, consider publishing a website for your completed package. See Hadley Wickham’s pkgdown for instructions on how to do that.
http://hadley.github.io/pkgdown/
Submission
Please follow the instructions to submit your project. For the final project, please use your submission issue to tag the TAs and the instructor (using ‘@tbrambor’). On GitHub please use a folder entitled Final_Project to submit all materials. In addition, you are also welcome to use a personal folder on your own Github account to publish your package (and allow installation directly via the devtools::install_github() command).


EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

E-mail: easydue@outlook.com  微信:easydue


EasyDue™是一个服务全球中国留学生的专业代写公司
专注提供稳定可靠的北美、澳洲、英国代写服务
专注提供CS、统计、金融、经济、数学等覆盖100+专业的作业代写服务