Notice: Undefined index: url in /var/www/html/wp-content/themes/orfeo/functions.php on line 432


Goal of the Assessment:

The purpose of this assessment is to give you a head start with your final project by finding an area of interest to study, real-world data to work with, and to research a little into your area of interest to see what has been accomplished surrounding your question. This is the general process in proposing a research question and will form the basis for a solid introduction section for your final project report. It will also give you the chance to think about the appropriateness of linear regression as a tool for statisticians. Lastly, it provides an opportunity to get some feedback on your writing and research question that can be used to improve your final report.


1. Decide on one (or a few possible) areas of interest that you may want to explore. This can be anything that matters or is of interest to you. Some examples could be (but are certainly not limited to) sports, medicine, public health, economics, video games, literature, etc. Pick something that you really care about.

2. Next, think about possible research questions you may want to study in these areas. What do you want to know in this area? You want to make sure that your question is able to be answered/studied using linear regression models. So you’ll want to frame your question to be something related to modelling a relationship or predicting a value based on this relationship.

3. After coming up with a research question, you will need to find some open-source data that you may use in your data analysis. You want to make sure that the data you find has your response variable of interest (or has variables that could be used to create that variable), as well as any other variable you may want to use as predictors. By looking for data online, you may realize you need to modify your research question slightly or pick another one if you can’t quite find the data you’re looking for. Alternatively, you can stick with your research question but be sure to mention that you expect there to be many limitations to the dataset because it doesn’t quite meet your needs. Step 4 can also help you decide what predictors might be needed for you to answer your question.

4. Once you’ve found your dataset and have decided on your research question (or you can work on steps 2-4 simultaneously and use what you find in all of them to finalize your research question), you need to look at what others have studied in relation to your research question. Do a quick search on the University of Toronto library website to learn about anything related to your area of interest and research question. Look for papers that studied the same question, or something related, that tells you a bit more

about what you may need to consider in your analysis and why your research question is important.

· Focus on giving your reader a rough idea of how many papers have studied this topic (or related to this topic) – this tells us how popular the area of research is and how much research has been done.

· Give examples from a few important papers about what has been found/discovered to be important in relation to your question (this can be important variables, important results, surprising results, etc.) – this tells us that you are aware of prior results and that you will be using these to plan your analysis.

· Think about how your research question fits into the area of research. Is it different or new (e.g. nobody has studied this, or maybe it hasn’t been done in this way or this population, etc.)? – this tells us that you see the importance of what you are researching and can frame it against what has already been done.

5. Lastly, perform a short exploratory data analysis of your chosen dataset. You’ll want to focus on identifying anything that you may need to consider moving forward. This includes identifying skews, statistical outliers, variables with high spread or observations that don’t make sense, and missing data, or a dataset that you think doesn’t quite have what you need. You’ll need to present numerical and/or graphical summaries describing the variables. Choose the options that highlight the features of the data that you want to point out but will also let your reader clearly understand the data that you’ll be working with.

· You want to make sure you specifically mention the presence of any of the above characteristics (or lack thereof) and what this means for the analysis you will eventually perform (i.e. how this might cause problems with the results of linear regression).

Submission Requirements:

Content Specific:

Your proposal should be written to satisfy the following requirements:

* Should be organized clearly (consider using headings or sections) and include the following information:

o Your research question, why you chose it (i.e. why it’s of interest to you), and some background information related to your question/area

o Justification for why regression is appropriate to use to answer your question and whether you will be aiming for a simpler model that has fewer predictors but is more easily interpretable, or whether you will be aiming for a model with prediction in mind and thus should have strong statistical properties at the expense of interpretability.

o Details and summaries on your chosen dataset including the variables collected, the number of observations and anything that stands out in the data that would need to be addressed/investigated further in your analysis.

o References for where you located the data, and your background research on your topic

* Should be written for an audience that has some statistics background but is not necessarily familiar with the area of your research question or linear regression models.

* Should contain figures and/or tables with captions and proper labels/titles as appropriate in your Exploratory Data Analysis section

* Should have references listed in proper APA format

* Should not contain R codes in the proposal itself

Technical Requirements:

Your submission should include the following:

1) Your proposal should be either a Word document or a PDF (both can be created using RMarkdown)

o This document should be no longer than 3 pages in length (excluding references)

§ About 1 page for the data and research question proposal, and up to 2 pages for your exploratory data analysis and figures/tables

o Should be typed using no less than 10 point font and be single-spaced.

2) You will also upload a separate Rmd file (or file containing all your R code) in addition to your proposal

3) You will also upload the dataset you are proposing to use in your final project (as a csv or excel file)


Examples of open data sources: * for open data from Toronto * for open data from Ontario * for data collected by Statistics Canada * for various sports-related datasets * for data on various country-level variables * for links to many other data portals through the UofT library

Library resources: * for more details about searching for articles related to your question * for details about why and how to cite your references * for help getting the correct citation format


EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

E-mail:  微信:easydue