STAT 5361 Take-Home Exam, Due 11:59pm, Friday, Nov. 15, 2019
Note. Theoretical derivations should be presented in sufficient and necessary detail. The
RMarkdown source should be portable such that it runs smoothly on another computer.
The pdf output should also be submitted to GitHub. Whenever possible, implement any
algorithm as a general function so that the code can be called easily with different parameters.
1. A zero-inflated Poisson (ZIP) random variable is zero with probability ξ and a Poisson
variable with mean λ with probability 1 − ξ. For i = 0, 1, 2, . . ., let ni be the observed
frequency of i. We will find the MLE of (ξ, λ) by the EM algorithm. Let n0 = zb + zp,
where zb is the counts of zeros from the Bernoulli distribution and zp is the counts of
zeros from the Poisson distribution.
(a) Think of (zb, zp) as missing data. Then (zb, zp, n1, n2, . . .) is the complete data.
Write down the complete-data loglikelihood.
(b) Find closed-form expressions for the E-step and the M-step.
(c) Write an R function to implement the EM algorithm, which takes arguments data
for the observed frequenct table, init for the initial value, and control for a list
of control parameters (e.g., tolerance, max iteration, etc.).
(d) The observed data (Thisted, 1988) are
dat <- data.frame(i = 0:6,
ni = c(3062, 587, 284, 103, 33, 4, 2))
Use initial value (ξ, λ) = (0.75, 0.40) to find the MLE.
2. The log-series distribution or logarithmic distribution has probability mass function
f(x) ∝
p
x
x
, x = 1, 2, . . . ,
where p ∈ (0, 1) is a parameter. This distribution, which is defined based on the
Maclaurin series expansion of − log(1 − p), has been applied to modeling species diversity in ecology.
(a) Show that the truncated geometric distribution with probability mass function
g(x) ∝ (1 − p)p
x
, x = 1, 2, . . ., provides an envolope to the log series distribution.
(b) Write a function rlogseries to generate random numbers from the distribution
using the rejection sampling algorithm. The two arguments of the function are
sample size n and parameter p.
(c) Generate a sample of size n = 1000 for each p ∈ {0.25, 0.50, 0.75} and compare
the histogram with the true probability mass function.
1
3. Random vector (X, Y ) has a joint probability density
f(x, y) ∝
e
−x
(1 − y)
τ−1
x + θy , x > 0, 0 < y < 1,
where τ > 0 and θ > 0 are parameters. Let k(x, y | x0, y0) be a transition kernel, such
that regardless of the value of (x0, y0),
k(x, y | x0, y0) ∝ e
−x
(1 − y)
τ−1
.
In other words, regardless of (x0, y0), the proposed X and Y are independent Exp(1)
and Beta(1, τ ) variables, respectively.
(a) Prove that in an Metropolis–Hastings (MH) algorithm for f(x, y) with k as the
proposal distribution, the MH ratio is
R(x, y | x0, y0) = x0 + θy0
x + θy .
(b) Write an R function to generate a MCMC sample of (X, Y ) of size n, with additional arguments nburn for a burn-in period, tau for τ , theta for θ.
(c) Sample n = 30, 000 observations of (X, Y ) with nburn = 5000, τ ∈ {0.2, 0.5, 1, 2}
and θ ∈ {0.5, 1, 2}. For each parameter combination, plot the sample contours
given X ≤ 1.
4. Consider a geometric Brownian motion
dS(t)
S(t)
= r dt + σ dW(t).
Suppose that we want to use Monte Carlo methods to estimate the value of a call option
which is well “out of the money”, one with a strike price K far above the current price
of the stock S0. The target is E[e
−rT (S0e
ZT − K)+], where ZT is N((r − σ
2/2)T, σ2T).
With crude Monte Carlo, the majority of the simulated values for S(T) would fall
below K and contribute zero to the option price.
(a) Consider importance sampling with sampler distribution N((µ − σ
2/2)T, σ2T) in
place of ZT . Write an R function to approximate the option value with Monte
Carlo sample size n and additional arguments µ, T, r, S0, K, and σ.
(b) Let T = 0.25, S0 = 10, K = 15, σ = 0.2, and r = 0.05. Approximate the option
pricing with n = 10, 000 for both the crude method with µ = r and an importance
sampling method with µ = log(K/S0), respectively. Repeat the approximation
1, 000 times and compare the mean and standard error of the approximated value
from the two methods.
(c) Repeat with K ∈ {12.5, 17.5} and comment on the results.
2