FIT3154 Assignment 2

Due Date: 11:59PM, Sunday, 27/10/2019

Introduction

There are a total of two questions worth 24 + 12 = 36 marks in this assignment.

This assignment is worth a total of 20% of your final mark, subject to hurdles and any other matters

(e.g., late penalties, special consideration, etc.) as specified in the FIT3154 Unit Guide or elsewhere

in the FIT3154 Moodle site (including Faculty of I.T. and Monash University policies).

Students are reminded of the Academic Integrity Awareness Training Tutorial Activity and, in particular, of Monash University’s policies on academic integrity. In submitting this assignment, you

acknowledge your awareness of Monash University’s policies on academic integrity and that work is

done and submitted in accordance with these policies.

Submission: No files are to be submitted via e-mail. Correct files are to be submitted to Moodle,

as given above. You must submit your files in a single ZIP archive. Your ZIP file should contain the

following:

1. One PDF file containing non-code answers to all the questions that require written answers. This

file should also include all your plots, and must be clearly written and formatted.

2. The required R and Stan files containing R and Stan code answers.

Please read these submission instructions carefully and take care to submit the correct files in the

correct places.

1

Question 1 (24 marks)

In this question we are going to use Poisson regression to analyse the daily dog bite admission data

that we examined in Assignment 1. Let y = (y1, . . . , yn) be n count observations that we would like

to model, and let xi = (xi,1, . . . , xi,p) be a vector of p predictors associated with the i-th data point.

Poisson regression model extends the usual Poisson distribution in the following way:

yi

| xi,1, . . . , xi,p ∼ Poi(λi) (1)

λi = exp

β0 +

Xp

j=1

βjxi,j

where β = (β1, . . . , βp) are the coefficients relating the predictors to the target and β0 is the intercept.

In this way the Poisson regression model extends the regular linear regression model to modelling

count data. The effects of the predictors are additive on the log-scale, and therefore multiplicative on

the original rate-scale. Download the file daily.dogbites.csv. It contains n = 378 days worth of

observations on four variables:

• All.Cases, the number of admissions to hospitals for dog bites on a given day;

• Public.Holiday, whether or not the day was a public holiday;

• Moon.Phase, which day of the lunar month the day fell on (Day00.01 denotes the day of the full

moon or 1 day after, Day02.03 denotes 2 or 3 days after the full moon, and so on);

• Day.Of.Week, which day of the week the day in question was.

We are treating the Moon.Phase variable as categorical so that there is one effect associated with each

pair of days of the lunar month. You will analyse the dog bites data using Bayesian Poisson lasso

regression:

yi

| xi,1, . . . , xi,p ∼ Poi(λi)

λi = exp

β0 +

Xp

j=1

βjxi,j

β0 ∼ C(0, 1) (2)

βj ∼ La(0, s), j = 1, . . . , p

s ∼ C

+(0, 1)

where La(0, s) is the Laplace (double-exponential) distribution with scale s, C(m, s) denotes a Cauchy

distribution with location m and scale s and C

+(0, s) denotes a half-Cauchy distribution with scale

s. Download the script df2matrix.R. It contains a function called df2matrix() that takes a data

frame and a formula and returns an appropriate matrix of predictors X and vector of targets y. Please

answer the following questions providing code as necessary.

1. When there are no predictors, the above hierarchy reduces to a simple Poisson distribution with

a rate of λ = exp β0. In this case, find the prior implied on λ by placing a Cauchy prior on β0

using the transformation of random variables technique from Lecture 5. [2 marks]

2. Using the glm() function fit a Poisson regression using maximum likelihood to All.Cases using

Public.Holiday, Moon.Phase and Day.Of.Week as predictors. Record the maximum likelihood

estimates in a table. [1 mark]

2

3. Implement the Bayesian version of Poisson regression described by the hierarchy (2). This should

be implemented in Stan, and should take an X matrix (of predictors) and a y vector (of targets).

You can use the df2matrix() function provided to transform the dog bites data frame to an

appropriate X matrix and y vector. (hint: the Laplace distribution is also known as the double

exponential). [2 marks]

4. Once you have coded up the Bayesian lasso, you should use it to analyse the dog bites data.

Produce a table showing the posterior means, appropriate credible intervals and Bayesian tstatistics for each of the regression coefficients along with the maximum likelihood estimates

found in Q1.2. How do the Bayesian estimates compare to the maximum likelihood estimates?

[3 marks]

5. What do the Bayesian estimates say about the effect of the predictors (Public.Holiday, Moon.Phase,

Day.Of.Week) on the rate at which dogs bite? Which effects seem the strongest? Interpret and

dicuss the results. [4 marks]

6. Which predictors are potentially unassociated with the rate of dog bites, and why? [2 marks]

7. According to your model, what effect does a public holiday have on the rate with which dogs

appear to bite? What is the range of the effect that you could plausibly expect a public holiday

to have on the rate of dog bites? [2 marks]

8. Provide a prediction of the rate of dog bites for each observation for the first full lunar month

(i.e., from day 7 to day 36) using the posterior mean estimates. Provide a 95% credible interval

for these predictions, and produce a plot showing the interval and the posterior mean predictions

for these days. [2 marks]

9. Imagine we want to calculate the probability of seeing 9 or more dog bite admissions on a single

day with the following characteristics: not a public holiday, day 15 of the lunar cycle, and is a

Sunday. Calculate the 95% credible interval for this probability. [2 marks]

10. Plot the rate of dog bites as predicted by the posterior mean estimates from your model for all

n = 378 days together with the observed number of dog bites. Comment on the plot. [2 marks]

11. Finally, using the fact that if Y ∼ Poi(λ), then E [Y ] = λ and V [Y ] = λ, can you think of a way

to assess how well your model predictions fit the actual dog bite data you have observed? Do

you think your Poisson regression model is compatible with the data, and if not, why not? [2

marks]

3

Question 2 (12 marks)

We have previously derived estimates for models by minimising some sort of objective function, such

as the negative log-likelihood, for which closed form expressions exist. This is the exception rather

than the rule; in general a numeric search of some kind is required. This question will require you to

use gradient descent to perform maximum likelihood estimation for a problem in which a simple closed

form solution does not exist, but for which it is known that there exists only a single (and therefore,

global) minima: Poisson regression.

You will write code to fit a Poisson regression using maximum likelihood and gradient descent.

To get you started I have provided a script called poissreg.gd.R which contains two functions:

poisson.gd() and my.standardise().

The main function poisson.gd() is a skeleton that takes a y and X, standardises the predictors

(as per Lecture 3) using my.standardise() and then has space for you to fill in with your gradient

descent code. Once your code has finished running, the function unstandardises the coefficients back

to the original scale. Answer the following questions:

1. Write down the expression for the negative log-likelihood of data y = (y1, . . . , yn) under the

Poisson regression model (1) with predictor matrix X of size n × p, coefficient vector β and

intercept β0. [2 marks]

2. Find the expressions for the gradient of the negative log-likelihood with respect to the intercept

β0 and the coefficients β1, . . . , βp. [2 marks]

3. Write an R function that takes a vector of data y, a matrix of predictors, X, an intercept beta0

and a vector of coefficients beta and returns: (i) the negative log-likelihood and (ii) the gradients

with respect to β0 and the coefficients β1, . . . , βp [2 marks]

4. Implement gradient descent to find the maximum likelihood estimates of the intercept β0 and

coefficients β1, . . . , βp. Use the function poisson.gd.R as basis to get started. The function

takes as arguments

• An X matrix of predictors and y vector of targets

• An initial value for the learning rate κ.

• The size of the largest absolute value of the gradient used for termination (max.grad).

The function does some initialisation of the estimates of β0 and β for you by using least-squares

onto the log-transformed targets. This are good starting points for a search. You will need to

write the code to find the maximum likelihood estimates βˆ

0 and βˆ

1, . . . , βˆ

p for the X and y that

were passed using the gradient descent algorithm. Once it has finished, it should return a list

containing:

• The ML estimate of the intercept β0.

• The ML estimates of the coefficients β1, . . . , βp

• The negative log-likelihood recorded at every iteration of your gradient descent loop.

4

Some further details

• Your learning rate, κ, must be adaptive. That is, if you make a step that results in the new

model having a greater negative log-likelihood than the previous model then you should

adjust your learning rate by scaling it by some constant c < 1, (for example, c = 0.9) and

try again.

• Your algorithm should terminate when the largest absolute value of the gradients is less

than the max.grad argument passed to your function.

• When writing your code, you should consider computational efficiency and ensure that it is

cleanly written and commented.

[5 marks]

5. Using the gradient descent code you have written, fit a Poisson regression to the dog bites

data you analysed previously. You should compare your estimates to those obtained using the

built-in glm() function to make sure your code is working properly. Plot the negative loglikelihoods recorded for every iteration of your descent loop (i.e., plot how the negative loglikelihood decreased as the algorithm ran until it reached the minima). [1 mark]

At the end of this assignment, you should reflect on how much you have learned since you started

this unit. You now have ideas on how to analyse count data using regression models and have managed

to write a program to fit such a model using maximum likelihood.

5

###### All

#### 统计代写 | STAT2011 Probability and Estimation

这个作业是统计赌博游戏中出现的可能性概率问题 STAT2011 Pro 阅读更多…