CMP-5017B/7008B Applied Statistics
Assignment:: Course Work 4
Set by : Dr Chris Greenman e-mail: C.Greenman@uea.ac.uk
Date set : 5 May 2019
Value : 25%
Date due : 14 May 2019 by 15:00
Returned by : 22 June 2019
Submission : Submit to blackboard
Learning outcomes
ANOVAs and Survival Analyses
Specification
Overview
To improve understanding of the material by working individually on problems based on the
lectures.
Description
Answer the following three questions.
1. The Yield (pods per year) of some cocoa plants are measured over a growing season.
The Height (m) of the trees are also known, which are also grouped into a HeightGroup
variable (small or large). The Genotype of a gene (thought to be related to yield) is also
known for each tree (AA, Aa or aa). The data are available in the file CocoaY ield.csv. You
may presume the assumptions for ANOVAs and ANCOVAs are satisfied in the following
questions.
(a) Visualize the data and interpret what you see. [marks 10]
(b) Run a two way ANOVA to look for significant effects, using HeightGroup and Genotype
as factors, and provide an interpretation of what you discover. [marks 10]
(c) Run an ANCOVA using Genotype factor and Height covariate to look for significant
effects, and provide an interpretation of what you discover. [marks 10]
(d) Comment with reasons on whether Height is a suitable covariate for the ANCOVA. [marks 5]
(e) Comment with reasons on which of (b) or (c) is the better approach. [marks 5]
2. A hazard function describing the lifetime (t months) of a lightbulb is described by:
h(t) = 
αt, 0 ≤ t ≤ 1,
α, otherwise,
where α is a constant.
(a) Determine the survival function S(t) and the failure probability density function f(t)
for the lightbulb. [marks 10]
(b) Determine the bulbs median lifetime, as a function of α. [marks 10]
(c) Calculate the probability that the lightbulb will last longer than two months, given that
the bulb is working after one month. [marks 5]
1
School of Computing Sciences
3. The survival time (weeks) for patients with a form of cancer are recorded in a clinical trial
comparing two treatment protocols over a period of 24 weeks for a number of individuals.
Censoring information is also available (0=censored, 1=cured). The data can be found in
CancSurv.csv
(a) Break the 24 weeks into six equally sized (i.e. approximately monthly) windows of
survival times and construct life tables for the two groups and produce corresponding
estimated survival plot(s). [marks 15]
(b) Use Kaplan Meier estimation to produce survival plot(s) for the data using original
weekly data. [marks 10]
(c) Describe the data presented in (a) and (b), and comment on which of the two approaches you think is better. [marks 5]
(d) Do a log-rank test to investigate the significance of differences between the two treatments, providing an interpretation of the result. [marks 5]
Relationship to formative assessment
We hand out exercises for you to do, the aim being to give you an opportunity to test understanding and exercise your skills; assignment’s questions are similar.
Deliverables
Please submit your piece of coursework in the following way:
1. Submit your assignment to blackboard. MAKE SURE YOUR STUDENT NUMBER IS
VISIBLE ON THE FRONT.
2. Giving full working that demonstrates the steps required to obtain the correct solution
and the important principles used. The logic of the solution is important.
3. Any R commands used should be included. They can be incorporated into the main text
or in an Appendix.
Resources
1. Blackboard notes.
2. If you have any questions e-mail me at C.Greenman@uea.ac.uk.
Marking scheme
1. Accuracy of answers.
2. Understanding of subject displayed.
3. Clarity of explanations of working.
4. Quality of reporting.
5. As always credit is given for a persuasive argument.
2

