Project grading

There are three parts to this project. You must complete all three parts to be considered for the full 30%. For instance, if you do NOT submit a Final Report you will not receive the completion points from the rough draft and peer review process.

As mentioned above, this project will be marked based on the output in the pdf submission. You must submit both the Rmd and pdf files for this project to receive full marks in terms of reproducibility. Furthermore, this is an individual project. You are expected to work individually. The workload level is higher than that of an assignment, since this is a project. Thus, it is recommended that you start early.

This project will be graded based off the rubric available on the Assignment Quercus page. TAs will look over each section (on the submitted pdf) and select the appropriate grade for that section based off a coarse overview (one-time read over) of that section (of the pdf). Your project should be well understood to the average university level student after reading it once. I would suggest you make sure your (pdf) document looks clean, aesthetically pleasing, and has been proofread. Since this is a final project, the process to review your grade on this assessment is handled by the Department of Statistical Sciences. Thus, you will need to apply through a process (TBA at a later date) to see the graded rubric and potentially inquire about a regrading. There may be some comments/feedback provided (by the TAs) if the same issue seems to be arising in multiple sections, but you will likely receive no comments/feedback (due to the scaling of the class and marking).


In this project you will write a report on a data analysis in which your main methodology will comprise of a collection of techniques taught in STA238 Winter 2021. The methodology must include the following:

  • at least one simple linear regression;
  • at least one confidence interval (either through a bootstrap or the Z/t approach);
  • at least one maximum likelihood estimator derivation (I would recommend putting the mathematics

    in the Appendix);

  • at least one hypothesis test of the mean;
  • at least one goodness of fit test;
  • at least one Bayesian credible interval. (Put derivations of the posterior into the Appendix).

    Please keep in mind that this analysis is for our course. Thus the analysis should be to answer a question about an underlying random process we have data from. You will find some data, form an interesting question and answer the question through your analysis. Your question should be stated clearly so that the reader can quickly identify it in the introduction (and repeated maybe more formally as a hypothesis test in the methods section).

    The report will consist of 8 sections: Abstract, Introduction, Data, Methods, Results, Conclusions, Bibliog- raphy and Appendix.

    There should be no evidence that this is a class project, I should be able to take a screenshot of this and paste it into a newspaper/blog. There should be no raw code. All output, tables, figures, etc. should be nicely formatted.

    This will allow you to look at some interesting aspects of the data. Please find some open source data through any R package that has not been used on a previous assignment in this course. Some examples of R packages with data that we have used in this course are dplyr, nycflights13, etc. Here is a list of R pack- ages available: https://cran.r-project.org/web/packages/available_packages_by_name.html. Additionally, if you prefer to use some other data available through a website (e.g., kaggle, github, etc.) that is also an option so long as the data is open, free and ethically viable for you to analyze. If you are unsure about whether your data is appropriate please visit one of our office hours and we will be happy to discuss.

    Based off the above criteria, the following three packages are OFF LIMITS. You CAN NOT use data from any of the following sources: The Toronto Open Data Portal, survey data from the 2019 Canadian Election Study.

    If you use data from the  CES, or Open Toronto Data Portal you will receive a 0 on this project. (I.e., do NOT use data from open Toronto data on this project; do NOT use data from CES on this project; do NOT use data from Stats Canada data on this project)

    The material and text on this project should be different from that of your previous assignments in this course. Thus, you should NOT directly copy your previous assignment work. We highly encourage you use feedback from previous assignments to amend/proofread/update your Final Project. If your work is a direct copy of a previous submission or is a direct copy of another person’s submission this is considered an academic offense.