这是一篇来自澳洲的关于现代应用统计学的统计代写

 

  1. The fifile assignment3 prob1.txt contains 300 observations. We can read the observations and make a histogram as follows.

> X = scan(file=”assignment3_prob1.txt”, what=double())

Read 300 items

> length(X)

[1] 300

> hist(X)

We will model the observed data using a mixture of three binomial distributions. Specififically,we assume the observations X1, . . . , X300 are independent to each other, and each Xi follows this mixture model:

Zi categorical (π1, π2, 1 π1 π2),

Xi|Zi = 1 Binomial(20, p1),

Xi|Zi = 2 Binomial(20, p2),

Xi|Zi = 3 Binomial(20, p3).

The binomial distribution has probability mass function

f(x; m, p) =  m x  px (1 p)mx.

We aim to obtain MLE of parameters θ = (π1, π2, p1, p2, p3) using the EM algorithm.

(a) (5 marks) Let X = (X1, . . . , X300) and Z = (Z1, . . . , Z300). Derive the expectation of the complete log-likelihood, Q(θ, θ0 ) = EZ|X,θ0 [log(P(X, Z|θ))].

(b) (3 marks) Derive E-step of the EM algorithm.

(c) (5 marks) Derive M-step of the EM algorithm.

(d) (5 marks) Note: Your answer for this problem should be typed. Answers including screen-captured R codes or fifigures won’t be marked.

Implement the EM algorithm and obtain MLE of the parameters by applying the implemented algorithm to the observed data, X1, . . . , X300. Set EM iterations to stop when either the number of EM-iterations reaches 100 (max.iter = 100) or the incomplete log-likelihood has changed by less than 0.00001 ( = 0.00001). Run the EM algorithm two times with the following two difffferent initial values and report estimators with the highest incomplete log-likelihood.

π1 π2 p1 p2 p3

1st initial values

0.3 0.3 0.2 0.5 0.7

2nd initial values

0.1 0.2 0.1 0.3 0.7

For each EM run, check that the incomplete log-likelihoods increase at each EM-step by plotting them.

  1. The fifile assignment3 prob2.txt contains 100 observations. We can read the 300 observations from the problem 1 and the new 100 observations and make histograms as follows.

> X = scan(file=”assignment3_prob1.txt”, what=double())

Read 300 items

> X.more = scan(file=”assignment3_prob2.txt”, what=double())

Read 100 items

> length(X)

[1] 300

> length(X.more)

[1] 100

> par(mfrow=c(2,2))

> hist(X, xlim=c(0,20), ylim=c(0,80))

> hist(X.more, xlim=c(0,20), ylim=c(0,80))

> hist(c(X,X.more), xlim=c(0,20), ylim=c(0,80), xlab=”X + X.more”, main = “Histogram of X + X.more”)

Let X1, . . . , X300 and X301, . . . , X400 denote the 300 observations from assignment3 prob1.txt and the 100 observations from assignment3 prob2.txt, respectively. We assume the observations X1, . . . , X400 are independent to each other. We model X1, . . . , X300 (from assignment3 prob1.txt) using the mixture of three binomial distributions (as we did in the problem 1), but we model X301, . . . , X400 (from assignment3 prob2.txt) using one of the three binomial distributions. Specififically, for i = 1, . . . , 300, Xi follows this mixture model:

Zi categorical (π1, π2, 1 π1 π2),

Xi|Zi = 1 Binomial(20, p1),

Xi|Zi = 2 Binomial(20, p2),

Xi|Zi = 3 Binomial(20, p3),

and for i = 301, . . . , 400,

Xi Binomial(20, p1).

We aim to obtain MLE of parameters θ = (π1, π2, p1, p2, p3) using the EM algorithm.

(a) (5 marks) Let X = (X1, . . . , X400) and Z = (Z1, . . . , Z300). Derive the expectation of the complete log-likelihood, Q(θ, θ0 ) = EZ|X,θ0 [log(P(X, Z|θ))].

(b) (5 marks) Derive E-step and M-step of the EM algorithm.

(c) (5 marks) Note: Your answer for this problem should be typed. Answers

including screen-captured R codes or fifigures won’t be marked.

Implement the EM algorithm and obtain MLE of the parameters by applying the implemented algorithm to the observed data, X1, . . . , X400. Set EM iterations to stop when either the number of EM-iterations reaches 100 (max.iter = 100) or the incomplete log-likelihood has changed by less than 0.00001 ( = 0.00001). Run the EM algorithm two times with the following two difffferent initial values and report estimators with the highest incomplete log-likelihood.

π1 π2 p1 p2 p3

1st initial values

0.3 0.3 0.2 0.5 0.7

2nd initial values

0.1 0.2 0.1 0.3 0.7

For each EM run, check that the incomplete log-likelihoods increase at each EM-step by plotting them.