STAT 5361 Take-Home Exam

Note. Theoretical derivations should be presented in sufficient and necessary detail. The RMarkdown source should be portable such that it runs smoothly on another computer.
The pdf output should also be submitted to GitHub. Whenever possible, implement any algorithm as a general function so that the code can be called easily with different parameters.
1. A zero-inflated Poisson (ZIP) random variable is zero with probability ξ and a Poisson variable with mean λ with probability 1 − ξ. For i = 0, 1, 2, . . ., let ni be the observed
frequency of i. We will find the MLE of (ξ, λ) by the EM algorithm. Let n0 = zb + zp,where zb is the counts of zeros from the Bernoulli distribution and zp is the counts of zeros from the Poisson distribution.
(a) Think of (zb, zp) as missing data. Then (zb, zp, n1, n2, . . .) is the complete data.
Write down the complete-data loglikelihood.
(b) Find closed-form expressions for the E-step and the M-step.
(c) Write an R function to implement the EM algorithm, which takes arguments data for the observed frequenct table, init for the initial value, and control for a list of control parameters (e.g., tolerance, max iteration, etc.).
(d) The observed data (Thisted, 1988) are dat <- data.frame(i = 0:6,ni = c(3062, 587, 284, 103, 33, 4, 2))
Use initial value (ξ, λ) = (0.75, 0.40) to find the MLE.
2. The log-series distribution or logarithmic distribution has probability mass function f(x) ∝pxx, x = 1, 2, . . . ,where p ∈ (0, 1) is a parameter. This distribution, which is defined based on the Maclaurin series expansion of − log(1 − p), has been applied to modeling species diversity in ecology.
(a) Show that the truncated geometric distribution with probability mass function g(x) ∝ (1 − p)px, x = 1, 2, . . ., provides an envolope to the log series distribution.
(b) Write a function rlogseries to generate random numbers from the distribution using the rejection sampling algorithm. The two arguments of the function are sample size n and parameter p.
(c) Generate a sample of size n = 1000 for each p ∈ {0.25, 0.50, 0.75} and compare the histogram with the true probability mass function.
3. Random vector (X, Y ) has a joint probability density f(x, y) ∝e−x(1 − y)τ−1x + θy , x > 0, 0 < y < 1,where τ > 0 and θ > 0 are parameters. Let k(x, y | x0, y0) be a transition kernel,such that regardless of the value of (x0, y0),k(x, y | x0, y0) ∝ e−x(1 − y)τ−1.In other words, regardless of (x0, y0), the proposed X and Y are independent Exp(1) and Beta(1, τ ) variables, respectively.
(a) Prove that in an Metropolis–Hastings (MH) algorithm for f(x, y) with k as the proposal distribution, the MH ratio is R(x, y | x0, y0) = x0 + θy0x + θy .
(b) Write an R function to generate a MCMC sample of (X, Y ) of size n, with additional arguments nburn for a burn-in period, tau for τ , theta for θ.
(c) Sample n = 30, 000 observations of (X, Y ) with nburn = 5000, τ ∈ {0.2, 0.5, 1, 2} and θ ∈ {0.5, 1, 2}. For each parameter combination, plot the sample contours given X ≤ 1.
4. Consider a geometric Brownian motion dS(t) S(t)= r dt + σ dW(t).
Suppose that we want to use Monte Carlo methods to estimate the value of a call option which is well “out of the money”, one with a strike price K far above the current price
of the stock S0. The target is E[e−rT (S0eZT − K)+], where ZT is N((r − σ2/2)T, σ2T).
With crude Monte Carlo, the majority of the simulated values for S(T) would fall below K and contribute zero to the option price.
(a) Consider importance sampling with sampler distribution N((µ − σ2/2)T, σ2T) in place of ZT . Write an R function to approximate the option value with Monte Carlo sample size n and additional arguments µ, T, r, S0, K, and σ.
(b) Let T = 0.25, S0 = 10, K = 15, σ = 0.2, and r = 0.05. Approximate the option pricing with n = 10, 000 for both the crude method with µ = r and an importance sampling method with µ = log(K/S0), respectively. Repeat the approximation 1, 000 times and compare the mean and standard error of the approximated value from the two methods.
(c) Repeat with K ∈ {12.5, 17.5} and comment on the results.

pdf 输出也应该提交给 GitHub。 只要有可能，将任何算法实现为通用函数，以便可以使用不同的参数轻松调用代码。
1. 零膨胀泊松 (ZIP) 随机变量为零，概率为 ξ，泊松变量的平均值为 λ，概率为 1 − ξ。 对于 i = 0, 1, 2, . . ., 让 ni 被观察到

(a) 将 (zb, zp) 视为缺失数据。 那么(zb, zp, n1, n2, . . .)就是完整的数据。