FIT3154 Assignment 2
Due Date: 11:59PM, Sunday, 27/10/2019
Introduction
There are a total of two questions worth 24 + 12 = 36 marks in this assignment.
This assignment is worth a total of 20% of your final mark, subject to hurdles and any other matters
(e.g., late penalties, special consideration, etc.) as specified in the FIT3154 Unit Guide or elsewhere
in the FIT3154 Moodle site (including Faculty of I.T. and Monash University policies).
Students are reminded of the Academic Integrity Awareness Training Tutorial Activity and, in particular, of Monash University’s policies on academic integrity. In submitting this assignment, you
acknowledge your awareness of Monash University’s policies on academic integrity and that work is
done and submitted in accordance with these policies.
Submission: No files are to be submitted via e-mail. Correct files are to be submitted to Moodle,
as given above. You must submit your files in a single ZIP archive. Your ZIP file should contain the
following:
1. One PDF file containing non-code answers to all the questions that require written answers. This
file should also include all your plots, and must be clearly written and formatted.
2. The required R and Stan files containing R and Stan code answers.
Please read these submission instructions carefully and take care to submit the correct files in the
correct places.
1
Question 1 (24 marks)
In this question we are going to use Poisson regression to analyse the daily dog bite admission data
that we examined in Assignment 1. Let y = (y1, . . . , yn) be n count observations that we would like
to model, and let xi = (xi,1, . . . , xi,p) be a vector of p predictors associated with the i-th data point.
Poisson regression model extends the usual Poisson distribution in the following way:
yi
| xi,1, . . . , xi,p ∼ Poi(λi) (1)
λi = exp

β0 +
Xp
j=1
βjxi,j

where β = (β1, . . . , βp) are the coefficients relating the predictors to the target and β0 is the intercept.
In this way the Poisson regression model extends the regular linear regression model to modelling
count data. The effects of the predictors are additive on the log-scale, and therefore multiplicative on
the original rate-scale. Download the file daily.dogbites.csv. It contains n = 378 days worth of
observations on four variables:
• All.Cases, the number of admissions to hospitals for dog bites on a given day;
• Public.Holiday, whether or not the day was a public holiday;
• Moon.Phase, which day of the lunar month the day fell on (Day00.01 denotes the day of the full
moon or 1 day after, Day02.03 denotes 2 or 3 days after the full moon, and so on);
• Day.Of.Week, which day of the week the day in question was.
We are treating the Moon.Phase variable as categorical so that there is one effect associated with each
pair of days of the lunar month. You will analyse the dog bites data using Bayesian Poisson lasso
regression:
yi
| xi,1, . . . , xi,p ∼ Poi(λi)
λi = exp

β0 +
Xp
j=1
βjxi,j

β0 ∼ C(0, 1) (2)
βj ∼ La(0, s), j = 1, . . . , p
s ∼ C
+(0, 1)
where La(0, s) is the Laplace (double-exponential) distribution with scale s, C(m, s) denotes a Cauchy
distribution with location m and scale s and C
+(0, s) denotes a half-Cauchy distribution with scale
s. Download the script df2matrix.R. It contains a function called df2matrix() that takes a data
frame and a formula and returns an appropriate matrix of predictors X and vector of targets y. Please
answer the following questions providing code as necessary.
1. When there are no predictors, the above hierarchy reduces to a simple Poisson distribution with
a rate of λ = exp β0. In this case, find the prior implied on λ by placing a Cauchy prior on β0
using the transformation of random variables technique from Lecture 5. [2 marks]
2. Using the glm() function fit a Poisson regression using maximum likelihood to All.Cases using
Public.Holiday, Moon.Phase and Day.Of.Week as predictors. Record the maximum likelihood
estimates in a table. [1 mark]
2
3. Implement the Bayesian version of Poisson regression described by the hierarchy (2). This should
be implemented in Stan, and should take an X matrix (of predictors) and a y vector (of targets).
You can use the df2matrix() function provided to transform the dog bites data frame to an
appropriate X matrix and y vector. (hint: the Laplace distribution is also known as the double
exponential). [2 marks]
4. Once you have coded up the Bayesian lasso, you should use it to analyse the dog bites data.
Produce a table showing the posterior means, appropriate credible intervals and Bayesian tstatistics for each of the regression coefficients along with the maximum likelihood estimates
found in Q1.2. How do the Bayesian estimates compare to the maximum likelihood estimates?
[3 marks]
5. What do the Bayesian estimates say about the effect of the predictors (Public.Holiday, Moon.Phase,
Day.Of.Week) on the rate at which dogs bite? Which effects seem the strongest? Interpret and
dicuss the results. [4 marks]
6. Which predictors are potentially unassociated with the rate of dog bites, and why? [2 marks]
7. According to your model, what effect does a public holiday have on the rate with which dogs
appear to bite? What is the range of the effect that you could plausibly expect a public holiday
to have on the rate of dog bites? [2 marks]
8. Provide a prediction of the rate of dog bites for each observation for the first full lunar month
(i.e., from day 7 to day 36) using the posterior mean estimates. Provide a 95% credible interval
for these predictions, and produce a plot showing the interval and the posterior mean predictions
for these days. [2 marks]
9. Imagine we want to calculate the probability of seeing 9 or more dog bite admissions on a single
day with the following characteristics: not a public holiday, day 15 of the lunar cycle, and is a
Sunday. Calculate the 95% credible interval for this probability. [2 marks]
10. Plot the rate of dog bites as predicted by the posterior mean estimates from your model for all
n = 378 days together with the observed number of dog bites. Comment on the plot. [2 marks]
11. Finally, using the fact that if Y ∼ Poi(λ), then E [Y ] = λ and V [Y ] = λ, can you think of a way
to assess how well your model predictions fit the actual dog bite data you have observed? Do
you think your Poisson regression model is compatible with the data, and if not, why not? [2
marks]
3
Question 2 (12 marks)
We have previously derived estimates for models by minimising some sort of objective function, such
as the negative log-likelihood, for which closed form expressions exist. This is the exception rather
than the rule; in general a numeric search of some kind is required. This question will require you to
use gradient descent to perform maximum likelihood estimation for a problem in which a simple closed
form solution does not exist, but for which it is known that there exists only a single (and therefore,
global) minima: Poisson regression.
You will write code to fit a Poisson regression using maximum likelihood and gradient descent.
To get you started I have provided a script called poissreg.gd.R which contains two functions:
poisson.gd() and my.standardise().
The main function poisson.gd() is a skeleton that takes a y and X, standardises the predictors
(as per Lecture 3) using my.standardise() and then has space for you to fill in with your gradient
descent code. Once your code has finished running, the function unstandardises the coefficients back
to the original scale. Answer the following questions:
1. Write down the expression for the negative log-likelihood of data y = (y1, . . . , yn) under the
Poisson regression model (1) with predictor matrix X of size n × p, coefficient vector β and
intercept β0. [2 marks]
2. Find the expressions for the gradient of the negative log-likelihood with respect to the intercept
β0 and the coefficients β1, . . . , βp. [2 marks]
3. Write an R function that takes a vector of data y, a matrix of predictors, X, an intercept beta0
and a vector of coefficients beta and returns: (i) the negative log-likelihood and (ii) the gradients
with respect to β0 and the coefficients β1, . . . , βp [2 marks]
4. Implement gradient descent to find the maximum likelihood estimates of the intercept β0 and
coefficients β1, . . . , βp. Use the function poisson.gd.R as basis to get started. The function
takes as arguments
• An X matrix of predictors and y vector of targets
• An initial value for the learning rate κ.
• The size of the largest absolute value of the gradient used for termination (max.grad).
The function does some initialisation of the estimates of β0 and β for you by using least-squares
onto the log-transformed targets. This are good starting points for a search. You will need to
write the code to find the maximum likelihood estimates βˆ
0 and βˆ
1, . . . , βˆ
p for the X and y that
were passed using the gradient descent algorithm. Once it has finished, it should return a list
containing:
• The ML estimate of the intercept β0.
• The ML estimates of the coefficients β1, . . . , βp
• The negative log-likelihood recorded at every iteration of your gradient descent loop.
4
Some further details
• Your learning rate, κ, must be adaptive. That is, if you make a step that results in the new
model having a greater negative log-likelihood than the previous model then you should
adjust your learning rate by scaling it by some constant c < 1, (for example, c = 0.9) and
try again.
• Your algorithm should terminate when the largest absolute value of the gradients is less