Question 1. (40 marks) MAXIMUM WORD LENGTH 1200 WORDS

Researchers are interested in measuring the impact of early special educational needs provision (SEND) for pupils in schools on early academic and health outcomes for vulnerable children. They have 11 cohorts of children, with the oldest cohort having started school in 2001/02 and completed primary   sat age 11 exams in 2007/08. The youngest cohort started school in 2011/12 and completed primary school and sat age 11 exams in 2017/18.

The researchers decide to start their analysis focussing on children born with a cleft palate (a split in the upper lip and/or roof of the mouth detected at birth). They examine the impact of receiving SEND provision by the beginning of the second year of school (the treatment) on:

  1. School test results at age 11 (outcome 1); and
  2. The number of unplanned hospital admissions between the second year and final year of primary school (outcome 2).

They are using a linked administrative data set that has childrens’ complete hospital records from birth linked to school administrative data. The health data has information on characteristics of the child at birth (birthweight, gestation, indicator of mother’s health) as well as records of all unplanned hospital admissions. The school data has information for every year children have spent at school including ethnicity, whether they are eligible for Free School Meals, whether English is their first language, as well as SEND provision and test results at age 11.

While a significant proportion of children with a cleft palate receive SEND provision by the start of the second year of primary school, there is also a significant proportion who don’t. The researchers observe that SEND provision is significantly more likely the younger is the child in their school year, with September-born children (the oldest) being the least likely to receive SEND provision, and August born children (the youngest) being the most likely. Each cohort has approximately 500 children born with a cleft palate.

The researchers are considering a number of evaluation strategies:

  • Regression methods that control for observed background characteristics;
  • A propensity-score matching (PSM) approach using appropriate matching variables;
  • An instrumental variable approach that uses month of birth as an instrument for SEND provision; and
  • A regression discontinuity design approach that exploits the discontinuity in SEND provision for those born on 31 August compared to those born on 1 September.

Discuss in detail how you would carry out each approach (e.g. what variables you would include in your regressions or match on, how you would do the IV, RDD etc) and your views on the appropriateness, strengths and weaknesses of the suggested method for the two outcomes of interest.  (32 marks – 8 marks per method)

What is your preferred approach for each outcome and why? (8 marks – 4 per outcome)

Question 2 (20 marks) MAXIMUM WORD LENGTH 600 WORDS

A randomised control trial of a breastfeeding encouragement program was introduced in Italy in the first part of this century. 20,000 women volunteered for the trial and it was agreed that half would be randomised into the trial. Baseline and follow up data were collected from 17,004 of the participants with 8,667 given access to the treatment. The trial wanted to look at the impact of the program on birthweight at 3 months old (wgt3). The mean weight at 3 months was 6064 grams with a standard deviation of 594 grams.

The data set contains measures of mother’s education, smoking status, education, region and age. The researcher looks at the impact of the program using a simple regression with just the treatment dummy (Model 1), a similar regression that also contains the other control variables (Model 2) and a fully interacted regression model where treatment is interacted with all the observed covariates (Model 3). The following results are found (standard errors in brackets):

Variable Model 1 Model 2 Model 3
treat 94.177 91.231 91.405
(9.074) (7.431) (7.431)


In a separate regression of treatment on the other observed covariates only the 3 location dummy variables are significant – all other covariates were insignificant.

  1. Interpret the coefficients above and explain which model you prefer and why. Has the randomisation worked? Do you think the program is effective?           (7 marks)
  2. You are told that the 3,000 women for whom no data could be collected came disproportionately from the lowest education group. Does this worry you? Do you have any other concerns about losing 3,000 women from the study?  (3 marks)
  3. Not all women who were randomised into treatment actually took part in the program, that is, there was contamination (see table below):
treat no yes Total
no 8,377 0 8,377
yes 3,083 5,584 8,667
Total 11,460 5,584 17,044


The researcher estimates 4 models to deal with this contamination. A treatment received analysis (Model 4), a per-protocol analysis (Model 5) a contamination adjusted ITT (CA-ITT) with no controls (Model 6) and a CA-ITT with controls (Model 7) – standard error in brackets.

Variable Model 4 Model 5 Model 6 Model 7
participated 195.965 179.652 146.173 147.450
(9.578) (10.012) (13.967) (13.647)


  1. Explain each model and the advantages and disadvantages of using each approach.  (7 marks).
  2. Show how the estimate from Model 6 is calculated using other information given in this question  (3 marks).

