本次澳洲代写是R数据分析的一个assignment

**1 Introduction**

There are total of three questions worth 10+10+8 = 28 marks in this assignment. This assignment is worth a total of 20% of your ﬁnal mark, subject to hurdles and any other matters (e.g., late penalties, special consideration, etc.) as speciﬁed in the FIT2086 Unit Guide or elsewhere in the FIT2086 Moodle site (including Faculty of I.T. and Monash University policies).

Submission Instructions: Please follow these submission instructions:

1. No ﬁles are to be submitted via e-mail. Submissions are to be made via Moodle.

2. Please provide a single ﬁle containing your report, i.e., your answers to these questions. Provide code/code fragments as required in your report, and make sure the code is written in a ﬁxed width font such as Courier New (or a screen shot is taken and inserted – please make sure this is neat and readable), or similar, and is grouped with the question the code is answering. You can submit hand-written answers, but if you do, please make sure they are clear and legible.

Do not submit multiple ﬁles – all your ﬁles should be combined into a single PDF ﬁle as required. Please ensure that your assignment answers the questions in the order speciﬁed in the assignment. Multiple ﬁles and questions out of order make the life of the tutors marking your assignment much more diﬃcult than it needs to be, and may attract penalties, so please ensure you assignment follows these requirements.

**Question 1 (10 marks)**

In this question we will revisit our analysis of the COVID-19 recovery data that we began in Assignment

1. The ﬁle covid.19.ass2.csv contains a subset of the New South Wales days-to-recovery data we examined previously; this time, patients with recovery times over four weeks (28 days) were removed as these recovery times are unusual and likely represent a sub-population of people more susceptible to the virus. We know from Assignment 1 that the Poisson distribution is not a good ﬁt to the recovery data: instead, for this question we will use a normal distribution as it provides an improved ﬁt to the data due to its increased ﬂexibility, while accepting this assumption is also not necessarily correct; to quote the famous statistician G.E.P.Box: “al l models are wrong – but some are more useful than others”.

Important: you may use R to determine the means and variances of the data, as required, and the R functions qt() and pnorm() but you must perform all the remaining steps by hand. Please provide appropriate R code fragments and all working out.

1. Calculate an estimate of the average number of days to recovery using the provided data. Calculate a 95% conﬁdence interval for this estimate using the t-distribution, and summarise/describe your results appropriately. Show working as required. [4 marks]

2. Similar data was collected in 2020 by the Israeli Ministry of Health. While the speciﬁc data was not available, the summary statistics were provided, and from these I have simulated a dataset of n = 494 individuals from the Israeli study. The days to recovery in this group are provided in the ﬁle israeli.covid.19.ass2.csv. Using the provided data and the approximate method for diﬀerence in means with (diﬀerent) unknown variances presented in Lecture 4, calculate the estimated mean diﬀerence in recovery times between the Israeli patients and the patients from NSW, and provide an approximate 95% conﬁdence interval. Summarise/describe your results appropriately. Show working as required. [3 marks]

3. It is of interest to determine if there are any diﬀerences, at a population level, in recovery times for patients in diﬀerent countries. Test the hypothesis that the population average time taken to recover for the Israeli cohort is the same as in the NSW cohort. Write down explicitly the hypothesis you are testing, and then calculate a p-value using the approximate hypothesis test for diﬀerences in means with (diﬀerent) unknown variances presented in Lecture 5. What does this p-value suggest about the diﬀerence in mean recovery time between the two cohorts of patients?