
Assignment 4 – 5CCM242A

This document details the content of the Data Analysis Report 4 that you will need to submit on the KEATS page of the module by 4pm on April 8th, 2021, if you are taking the level 5 version of the module. Please submit up to 3 (three) A4 pages in .pdf/.docx or equivalent formats with minimum font size of 12pt. This should include figures and R code when necessary. If your submission includes more than three pages, only the first three pages of your submission will be marked.

Your submission must be anonymous.
Your submission will be marked on a scale up to 100 with the marking scheme below and it will contribute to

40% of the final mark for the module.

Marking scheme

  • Exercise 1: 30 Marks
  • Exercise 2: 20 Marks
  • Exercise 3: 30 Marks
  • Presentation:

    10 Marks A well written and well organised report.
    10 Marks Completeness and readability of the plots (title, labels,position, etc…).

    Exercise 1

    Download from the KEATS page of the module and import in R the dataset Darts.csv, which contains data on 91 Archaic dart points recovered during surface surveys at Fort Hood, Texas. These data have been extracted from the R package archdata. The dataset contains the following variables:

    Name. Dart point type: Darl, Ensor, Pedernales, Travis, Wells Length. Maximum Length (mm)
    Width. Maximum Width (mm)
    Thickness. Maxmimum Thickness (mm)

    (a) Using an appropriate model selection strategy and variables transformation if necessary, choose and fit the best linear regression model for Width. This includes checking the model assumptions and fixing obvious issues. Comment on potential issues that you were not able to fix (if any). [25 Marks]

    (b) What is the predicted weight for a dart of type Travis, Length=50, Width=23 and Thickness=8? [5 Marks]

    Exercise 2

    Download from the KEATS page of the module and import in R the dataset optimal.csv. This file contains observations from a response variable y and 2 predictors x1 and x2.


(a) By estimating the appropriate model, find out the values of x1 and x2 that minimise the expected response y. [15 Marks]

(b) Provide a 95% confidence interval for the expected response at the values of x1 and x2 chosen in part (a). [5 Marks]

Exercise 3

The dataset warpbreaks in R contains data about the number of warp breaks (breaks) for 2 different types of wool (wool, coded A and B) and 3 different level of tensions (tension, coded L, M and H). You can load the data in R with the following instructions:

Fit now the following generalised linear model:

glm1<- glm(breaks ~ wool*tension,family=poisson, data = warpbreaks)

(a) Explain what model has been fitted, writing down explicitly the model assumptions and the relationship between the expected value of the number of breaks and the predictors. What are the estimates for the parameters of the model? [25 Marks]

(b) What is the expected number of breaks when wool is of type A and the tension is at level M? [5 Marks]

library(datasets) data(“warpbreaks”)