The project should be written in R Markdown. There is no required page length. The project should incorporate information from the three assignments already turned in including descriptions of the data set and variables, the questions motivating the work, and descriptions of the data preparation process. This final version of the project will extend the work already done by using graphics and tables, where appropriate, to visualize the data to answer the motivating questions. The most successful projects will use graphics that follow Tufte’s guidelines and incorporate multiple and varied aspects of the coding techniques we cover in class. Finally, note that the R Markdown file for the final project should be well-organized (i.e., with sections and headers) and readable (i.e., it should be free from typographical errors and the writing should flow well).
Be sure to use ggplot, tidyverse, dyplr to wrange and clean the data, as well as creating visualization to describe relationship between variables.
Things to do
- Wrangle the dataset. Make sure that there are no missing values, where rows and columns are all properly formatted. Each cell should only contain one value. Would recommend using something from pivot and related functions. Be sure to separate and unite certain things. Make sure missing values are presented accordingly.
- Visualization: Demonstrate that there are relationships between each variable. Use ggplot for this. Create histograms, bar graphs, line graphs, scatter plots to map the relationship between the data.
- When using Dplyr, make sure the codes have somethings like filter(), arrange(), select() and etc…
- You can also demonstrate skills by creating your own Tibble off of the dataset, and subset it accordingly.
- Exploratory questions regarding the dataset to create analysis
- What is the relationship between population and total covid death case?
- Relationship between total death and total population
- Relationship between number of tests with the total death. Does more test mean more death because they are tested? Therefore, countries with less test ought to not have comprehensive numbers.
- May download a new dataset, such as US GDP. Will filter out the highest death and case rate for covid and correlate it with the country’s GDP. Does having a high GDP mean people are more likely to catch it?
The whole point of this project is to demonstrate the skills I “learned”, so there isn’t a concrete set of questions for us to answer. Some graphs below are drawn from different datasets, but we can run codes with our dataset that does similar things.
EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!
E-mail: firstname.lastname@example.org 微信:easydue