这个作业是用R语言进行方差分析和生存分析

CMP-5017B/7008B Applied Statistics
Description
Answer the following three questions.
1. The Yield (pods per year) of some cocoa plants are measured over a growing season.
The Height (m) of the trees are also known, which are also grouped into a HeightGroup variable (small or large). The Genotype of a gene (thought to be related to yield) is also known for each tree (AA, Aa or aa). The data are available in the file CocoaY ield.csv. You may presume the assumptions for ANOVAs and ANCOVAs are satisfied in the following questions.
(a) Visualize the data and interpret what you see. [marks 10]
(b) Run a two way ANOVA to look for significant effects, using HeightGroup and Genotype as factors, and provide an interpretation of what you discover. [marks 10]
(c) Run an ANCOVA using Genotype factor and Height covariate to look for significant effects, and provide an interpretation of what you discover. [marks 10]
(d) Comment with reasons on whether Height is a suitable covariate for the ANCOVA. [marks 5]
(e) Comment with reasons on which of (b) or (c) is the better approach. [marks 5]
2. A hazard function describing the lifetime (t months) of a lightbulb is described by:
h(t) = αt, 0 ≤ t ≤ 1, α, otherwise, where α is a constant.
(a) Determine the survival function S(t) and the failure probability density function f(t) for the lightbulb. [marks 10]
(b) Determine the bulbs median lifetime, as a function of α. [marks 10]
(c) Calculate the probability that the lightbulb will last longer than two months, given that the bulb is working after one month. [marks 5]

School of Computing Sciences
3. The survival time (weeks) for patients with a form of cancer are recorded in a clinical trial comparing two treatment protocols over a period of 24 weeks for a number of individuals.
Censoring information is also available (0=censored, 1=cured). The data can be found in CancSurv.csv
(a) Break the 24 weeks into six equally sized (i.e. approximately monthly) windows of survival times and construct life tables for the two groups and produce corresponding estimated survival plot(s). [marks 15]
(b) Use Kaplan Meier estimation to produce survival plot(s) for the data using original weekly data. [marks 10]
(c) Describe the data presented in (a) and (b), and comment on which of the two approaches you think is better. [marks 5]
(d) Do a log-rank test to investigate the significance of differences between the two treatments, providing an interpretation of the result. [marks 5]
Relationship to formative assessment
We hand out exercises for you to do, the aim being to give you an opportunity to test understanding and exercise your skills; assignment’s questions are similar.
Deliverables
Please submit your piece of coursework in the following way:
1. Submit your assignment to blackboard. MAKE SURE YOUR STUDENT NUMBER IS VISIBLE ON THE FRONT.
2. Giving full working that demonstrates the steps required to obtain the correct solution and the important principles used. The logic of the solution is important.
3. Any R commands used should be included. They can be incorporated into the main text or in an Appendix.