本次澳洲代写是R数据分析的一个assignment

1 Introduction

There are total of three questions worth 10+10+8 = 28 marks in this assignment. This assignment is worth a total of 20% of your final mark, subject to hurdles and any other matters (e.g., late penalties, special consideration, etc.) as specified in the FIT2086 Unit Guide or elsewhere in the FIT2086 Moodle site (including Faculty of I.T. and Monash University policies).

Submission Instructions: Please follow these submission instructions:

1. No files are to be submitted via e-mail. Submissions are to be made via Moodle.

2. Please provide a single file containing your report, i.e., your answers to these questions. Provide code/code fragments as required in your report, and make sure the code is written in a fixed width font such as Courier New (or a screen shot is taken and inserted – please make sure this is neat and readable), or similar, and is grouped with the question the code is answering. You can submit hand-written answers, but if you do, please make sure they are clear and legible.

Do not submit multiple files – all your files should be combined into a single PDF file as required. Please ensure that your assignment answers the questions in the order specified in the assignment. Multiple files and questions out of order make the life of the tutors marking your assignment much more difficult than it needs to be, and may attract penalties, so please ensure you assignment follows these requirements.

Question 1 (10 marks)

In this question we will revisit our analysis of the COVID-19 recovery data that we began in Assignment

1. The file covid.19.ass2.csv contains a subset of the New South Wales days-to-recovery data we examined previously; this time, patients with recovery times over four weeks (28 days) were removed as these recovery times are unusual and likely represent a sub-population of people more susceptible to the virus. We know from Assignment 1 that the Poisson distribution is not a good fit to the recovery data: instead, for this question we will use a normal distribution as it provides an improved fit to the data due to its increased flexibility, while accepting this assumption is also not necessarily correct; to quote the famous statistician G.E.P.Box: “al l models are wrong – but some are more useful than others”.

Important: you may use R to determine the means and variances of the data, as required, and the R functions qt() and pnorm() but you must perform all the remaining steps by hand. Please provide appropriate R code fragments and all working out.

1. Calculate an estimate of the average number of days to recovery using the provided data. Calculate a 95% confidence interval for this estimate using the t-distribution, and summarise/describe your results appropriately. Show working as required. [4 marks]

2. Similar data was collected in 2020 by the Israeli Ministry of Health. While the specific data was not available, the summary statistics were provided, and from these I have simulated a dataset of n = 494 individuals from the Israeli study. The days to recovery in this group are provided in the file israeli.covid.19.ass2.csv. Using the provided data and the approximate method for difference in means with (different) unknown variances presented in Lecture 4, calculate the estimated mean difference in recovery times between the Israeli patients and the patients from NSW, and provide an approximate 95% confidence interval. Summarise/describe your results appropriately. Show working as required. [3 marks]

3. It is of interest to determine if there are any differences, at a population level, in recovery times for patients in different countries. Test the hypothesis that the population average time taken to recover for the Israeli cohort is the same as in the NSW cohort. Write down explicitly the hypothesis you are testing, and then calculate a p-value using the approximate hypothesis test for differences in means with (different) unknown variances presented in Lecture 5. What does this p-value suggest about the difference in mean recovery time between the two cohorts of patients?