Overview of ICA 3
A group of five social media personalities runs an online streaming channel, where they live stream videos of themselves playing various computer games. Any one of the group members can produce this content, and can choose what they want it to involve.
They are interested in understanding more about how the number of viewers of their live streams is influenced by certain variables. Over the last three years, the group has collected information about a random sample of their live streams.
One member of the group knows a lot about statistical methods, but has no time to carry out the analysis themselves, while the other members are only familiar with very basic statistical techniques. They would therefore like to enlist your help to analyse their data. The variables they can provide you with are as follows:
viewers: the number of people who watched the video live (anyone who catches up with the recording later is not included here);
genre: the type of game being played;
host: the main host of the live stream (either Player1, Player2, Player3, Player4 or Player5);
subscribers: the total number of subscribers to their channel at the start of the previous week;
day: the day of the week on which the live stream was filmed;
season: the season in which the live stream was filmed (Spring, Summer, Autumn or Winter);
guests: the number of guest players included in the live stream;
ads_now: the number of adverts in the live stream;
ads_last: the number of adverts in the previous live stream.
The dataset is contained within the file
live_streams.csv. The observations in the dataset are given in the order they were collected.
There are two tasks associated with this ICA. You must complete both. More details are given below, but in summary you need to:
- analyse the given dataset culminating in a model for the number of viewers, writing your findings as a report;
- record a presentation of your results, suitable for a lay audience.
Some general points on groupwork
- You can work in groups of up to four students, but you MUST register your group on Moodle before starting this project.
- Students choose their own group, therefore the group are responsible for ensuring that you work well together and that everyone is contributing equally. Make sure that you have chosen a group you know you can work well with.
- Groups may contact me by email if there is a problem, but please note that I will only be able to help if you tell me early, i.e., well in advance of the coursework deadline.
- The ICA has been designed so that all group members contribute to all aspects of the ICA. You should not, therefore, try to split up the tasks in any way.
Task 1 – Analysis of the live streams data [40 marks]
You are required to develop a model for the number of viewers, taking into account the points listed below. Please note that you should not treat these as a list of questions to be answered; they are just pointers about the sorts of things you should include in your report.
Write a report on your analysis. This should be suitable for someone who is familiar with the linear regression techniques covered in STAT0006 (e.g., the member of the social media group who has statistical expertise).
- [7 marks] Provide an exploratory analysis of the dataset. The aim of this is to give someone who doesn’t have access to the dataset (or hasn’t had time to look at it in detail) an overview of what the data are and a feel for the variables available (e.g., summaries of individual variables or of simple relationships). This part should be non-technical.
- [15 marks] Give a description of how you approached the model-building phase. Don’t just show your chosen final model. How did you choose your particular model? What processes did you go through? You do not need to list every single model you tried here, please just give an explanation of what you consider to be the main highlights of the modelling process.
- [8 marks] Report on the fit of the final model. You should include plots (but not an excessive number – choose wisely) to show the reader that there are no obvious departures from the assumptions OR to show any concerning patterns if you find any (this is OK, but you should make sure you discuss how you’ve tried to alleviate the problem).
- Note that steps 2 and 3 above are likely to be an iterative process.
- You don’t need to show every single model you tried in your report, otherwise your report may be unnecessarily long. Try to think of it as telling the reader a story about the model building phase where you only need to explain the main developments of the plot!
- [5 marks] A conclusion which summarises what your model tells you about the main factors that influence the number of viewers, and how they influence it.
- Note that this doesn’t mean you need to mechanically interpret every regression coefficient.
- You should think about how each covariate impacts the outcome.
- You may want to comment on any particular issues you have with the model.
- [5 marks] After building your model, the social media group ask you to carry out another task. They have been considering starting a new channel focusing on a single genre of game, but are struggling to decide between
Racing. To help them with this, you should:
- extract the relevant observations from the dataset (where
Racing), stating in your report how many observations you’re left with;
- construct a simple linear regression model with
viewersas the outcome and
genreas a categorical covariate with two levels;
- carry out an appropriate hypothesis test to assess whether there is a difference between the average number of viewers for the two genres;
- explain your result and provide a recommendation about which genre they should choose for their new channel;
- discuss whether you have any concerns about the assumptions of this test.
- extract the relevant observations from the dataset (where
Writing up your report
- The report should be written in R Markdown.
- The maximum word count for this report is 2,000 words.
- This is a hard limit and not a target word count.
- Captions for any tables/plots do not contribute to the word limit, however excessively long captions will be penalised under ‘clarity’.
- If you need to use references, these don’t count toward your word count.
- You should submit a pdf copy of the completed report, as well as the original .Rmd file.
- You can either use the associated .Rmd template, or create your own.
- If you choose to create your own .Rmd file, the following need to be present at the start of your document (see the template for an example):
- Group number;
- Student numbers of all students in the group.
- Make sure that you think about the structure of your report and that this is clear (it is a good idea to use suitable headings in your report).
- You should also make sure that each section follows on from the previous one in a logical way.
- Marks allocated to each section are given in the description of the task.
- For each section, this will be further split by content and clarity.
- The ‘content’ marks looks at what you write/include in your report, including its relevance and accuracy;
- The ‘clarity’ marks looks at how you present your written report, including structure and ease of reading the report.
- A group mark will be given for the report, so everyone in the group will get the same mark.
EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!
E-mail: email@example.com 微信:easydue