本次统计代写主要是一个数据分析R语言的自由选题作业

STA238 – Winter 2021

Final Project Instructions

此项目分为三个部分。您必须完成所有三个部分,才能考虑全部30%。例如,如果您不提交最终报告,您将不会从草稿和同行评审过程中获得完成点。

如上所述,该项目将根据pdf提交中的输出进行标记。您必须同时提交该项目的Rmd和pdf文件,才能在可复制性方面获得满分。此外,这是一个单独的项目。您应该单独工作。因为这是一个项目,所以工作负载级别高于分配的工作负载级别。因此,建议您尽早开始。

该项目将根据“作业Quercus”页面上可用的标题进行评分。 TA将查看每个部分(在提交的pdf上),并根据该部分(pdf)的粗略概述(一次性阅读)为该部分选择合适的等级。阅读过一次后,普通大学水平的学生应该对您的项目有所了解。我建议您确保您的(pdf)文档看起来干净,美观并且已经过校对。由于这是最后的项目,因此由统计科学系负责审核此评估的成绩。因此,您将需要通过一个流程(稍后再待定)来申请,以查看已分级的专栏并潜在地询问是否需要重新分级。如果在多个部分中似乎都出现了相同的问题,则可能会有(TA)提供一些注释/反馈,但是您可能不会收到任何注释/反馈(由于类和标记的缩放)。

描述:

在此项目中,您将编写有关数据分析的报告,其中主要方法将包括STA238 2021年冬季课程中讲授的一系列技术。该方法必须包括以下内容:

至少一个简单的线性回归;

至少一个置信区间(通过引导或Z / t方法);

至少一个最大似然估计值的推导(我建议将数学

在附录中);

至少一项均值的假设检验;

至少一项拟合优度测试;

至少一个贝叶斯可信区间。 (将后验派生到附录中)。

请记住,此分析是针对我们的课程。因此,分析应该是回答有关我们从中获得数据的潜在随机过程的问题。您将找到一些数据,形成一个有趣的问题,并通过分析来回答该问题。您的问题应该清楚地说明,以便读者可以在引言中快速找到它(并在方法部分中作为假设检验更正式地重复)。

该报告将包括8个部分:摘要,引言,数据,方法,结果,结论,书目和附录。

应该没有证据表明这是一个班级项目,我应该能够对此截图,并将其粘贴到报纸/博客中。不应有原始代码。所有输出,表格,图形等都应正确格式化。

这将使您查看数据的一些有趣方面。请通过本课程以前的作业中未使用的任何R包找到一些开源数据。我们在本课程中使用过的带有数据的R包的一些示例是dplyr,nycflights13等。这是可用的R包的列表:https://cran.r-project.org/web/packages/available_packages_by_name。 html。此外,如果您更喜欢使用网站上提供的其他一些数据(例如kaggle,github等),也可以选择,只要这些数据是开放的,免费的并且在道德上对您来说是可行的。如果不确定您的数据是否合适,请访问我们的办公时间之一,我们将很乐意与您讨论。

根据上述标准,以下三个软件包是不适用的。您不能使用以下任何来源的数据:多伦多公开数据门户网站,2019年加拿大大选研究的调查数据

CES或Open Toronto数据门户网站上的数据,则此项目将收到0。 (即,请勿使用该项目中来自多伦多公开数据的数据;请勿使用该项目中来自CES的数据;请勿使用此项目中的Stats Canada数据中的数据)

该项目的材料和文本应与本课程中以前的作业不同。因此,您不应该直接复制以前的作业

Project grading

There are three parts to this project. You must complete all three parts to be considered for the full 30%. For instance, if you do NOT submit a Final Report you will not receive the completion points from the rough draft and peer review process.

As mentioned above, this project will be marked based on the output in the pdf submission. You must submit both the Rmd and pdf files for this project to receive full marks in terms of reproducibility. Furthermore, this is an individual project. You are expected to work individually. The workload level is higher than that of an assignment, since this is a project. Thus, it is recommended that you start early.

This project will be graded based off the rubric available on the Assignment Quercus page. TAs will look over each section (on the submitted pdf) and select the appropriate grade for that section based off a coarse overview (one-time read over) of that section (of the pdf). Your project should be well understood to the average university level student after reading it once. I would suggest you make sure your (pdf) document looks clean, aesthetically pleasing, and has been proofread. Since this is a final project, the process to review your grade on this assessment is handled by the Department of Statistical Sciences. Thus, you will need to apply through a process (TBA at a later date) to see the graded rubric and potentially inquire about a regrading. There may be some comments/feedback provided (by the TAs) if the same issue seems to be arising in multiple sections, but you will likely receive no comments/feedback (due to the scaling of the class and marking).

Description:

In this project you will write a report on a data analysis in which your main methodology will comprise of a collection of techniques taught in STA238 Winter 2021. The methodology must include the following:

  • at least one simple linear regression;
  • at least one confidence interval (either through a bootstrap or the Z/t approach);
  • at least one maximum likelihood estimator derivation (I would recommend putting the mathematics

    in the Appendix);

  • at least one hypothesis test of the mean;
  • at least one goodness of fit test;
  • at least one Bayesian credible interval. (Put derivations of the posterior into the Appendix).

    Please keep in mind that this analysis is for our course. Thus the analysis should be to answer a question about an underlying random process we have data from. You will find some data, form an interesting question and answer the question through your analysis. Your question should be stated clearly so that the reader can quickly identify it in the introduction (and repeated maybe more formally as a hypothesis test in the methods section).

    The report will consist of 8 sections: Abstract, Introduction, Data, Methods, Results, Conclusions, Bibliog- raphy and Appendix.

    There should be no evidence that this is a class project, I should be able to take a screenshot of this and paste it into a newspaper/blog. There should be no raw code. All output, tables, figures, etc. should be nicely formatted.

    This will allow you to look at some interesting aspects of the data. Please find some open source data through any R package that has not been used on a previous assignment in this course. Some examples of R packages with data that we have used in this course are dplyr, nycflights13, etc. Here is a list of R pack- ages available: https://cran.r-project.org/web/packages/available_packages_by_name.html. Additionally, if you prefer to use some other data available through a website (e.g., kaggle, github, etc.) that is also an option so long as the data is open, free and ethically viable for you to analyze. If you are unsure about whether your data is appropriate please visit one of our office hours and we will be happy to discuss.

    Based off the above criteria, the following three packages are OFF LIMITS. You CAN NOT use data from any of the following sources: The Toronto Open Data Portal, survey data from the 2019 Canadian Election Study.

    If you use data from the  CES, or Open Toronto Data Portal you will receive a 0 on this project. (I.e., do NOT use data from open Toronto data on this project; do NOT use data from CES on this project; do NOT use data from Stats Canada data on this project)

    The material and text on this project should be different from that of your previous assignments in this course. Thus, you should NOT directly copy your previous assignment work. We highly encourage you use feedback from previous assignments to amend/proofread/update your Final Project. If your work is a direct copy of a previous submission or is a direct copy of another person’s submission this is considered an academic offense.