这个作业是用R语言计算统计信息

MATH4007 COMPUTATIONAL STATISTICS

Assessed Coursework 1 — 2019/2020

The deadline for this work is 3pm, Wednesday 11 December 2019, to be submitted via

the “Coursework 1 Submission” link on Moodle. Unauthorised late submission will be penalised

by 5% of the full mark per day. Work submitted more than one week late will receive zero marks.

All components (see below) must be submitted by the deadline for the work to be considered

on time. You are reminded to familiarise yourself with the guidelines concerning plagiarism in

assessed coursework (see the student handbook), and note that this applies equally to computer

code as it does to written work. The submission should contain:

1. A pdf file containing any computational results (plots/relevant output) and discussion. This

can be produced using e.g. R Markdown, or by copying output into a Word document.

Please convert any documents to pdf for uploading.

2. A pdf of your theoretical working. A scan of handwritten work is fine, but you could also

typeset using Latex if you prefer. If it’s more convenient, you can combine this and the

above part into one, e.g. if you wish to put everything in one Latex document, but this is

not required.

3. An R script file, i.e. with a .r extension containing your R code. This should be clearly

formatted, and include brief comments so that a reader can understand what it is doing.

The code should also be ready to run without any further modification by the user, and

should reproduce your results (approximately, for simulation-based results).

Please make sure that all required working, results, details of implementation and discussion are

contained in components 1 and 2 of the above list and not in the script file. The script file

will only be used for verification of results. The exception is for the R code itself, whereby it is

sufficient to say “refer to script file” where a question asks you to write R code.

1. Data y = (y1, . . . , yn) are assumed to come from a N(µ, σ2

) distribution. A Bayesian

analysis is to be performed for the parameters µ and σ, which are assumed to have

independent prior distributions with µ ∼ N(µ0, τ 2

0

) and p(σ) ∝

1

σ

, where µ0 and τ

2

0

are known constants. (The prior on σ corresponds to a “uniform” prior on log(σ), which

is a standard way to specify a noninformative prior on σ.)

(a) Verify that the posterior distribution is

p(µ, σ|y) = 1

K

p1(µ, σ|y),

where

p1(µ, σ|y) = σ

−(n+1) exp (

−

1

2σ

2

X

i

(yi − µ)

2 −

1

2τ

2

0

(µ − µ0)

2

)

,

and

K =

Z ∞

−∞

Z ∞

0

p1(µ, σ|y)dσdµ

is the normalising constant.

(b) The observed data are

−1.97 0.46 1.14 − 1.63 2.95 − 3.23 − 3.18 0.37 0.45 − 2.80.

The values µ0 = 0 and τ0 = 100 are chosen to reflect vague prior information about

µ. Use the 2-d mid-ordinate rule to calculate K.

(c) The marginal posterior distribution of µ is

p(µ|y) = Z ∞

0

p(µ, σ|y)dσ,

which is not available in closed form. Give full details of Laplace’s method to compute

p(µ|y) at a particular point µ.

(d) Write a function in R to compute p(µ|y) at a particular point µ using Laplace’s method

derived in (c).

(e) Write a function in R to perform the Golden-ratio method to find the mode of p(µ|y),

using your R function from part (d) as the function to optimize.

(f) Hence, find the mode of p(µ|y) to an accuracy of 1 decimal place.

[20]

2. A random variable Z is said to follow a log-normal distribution with parameter β if Z =

exp(X), where X ∼ N(0, β). The density of Z is

p(z) ∝

1

z

√

β

exp

−

1

2β

(log z)

2

, z > 0.

The density of a random variable Z which follows a Gamma distribution with parameters

a and b is

p(z) ∝ z

a−1

exp{−bz}.

(a) The joint density of a pair of random variables Y and λ is given by

p(y, λ) ∝

λ

19

2

y

exp

−

λ

2

(log y)

2 − 10λ

, y > 0, λ > 0.

Prove that the marginal distribution of λ is the Gamma distribution with parameters

a = 10 and b = 10.

(b) Prove that the conditional distribution of Y given λ, p(y|λ), is log-normal with

parameter 1

λ

.

(c) Describe how these results can be used to simulate from the marginal distribution

p(y).

(d) Hence, simulate 10000 samples from the marginal distribution p(y). Use your samples

to estimate the mean and variance of the distribution, and P(Y > 10).

(e) The true marginal pdf of Y is

p(y) = k

y

1 +

(log y)

2

10 −5.5

,

where k = 0.389. Plot a histogram of your samples, scaled to have area 1. Overlay

the true pdf and comment on the agreement.

Continued overleaf

2

(f) The marginal distribution of Y is of a type commonly used in reliability analysis,

where “unusually large” observations have a non-negligible probability content. With

reference to your histogram, explain why this distribution might be useful for modelling

such data.

[20]

3

EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

**E-mail:** easydue@outlook.com **微信:**easydue

**EasyDue™是一个服务全球中国留学生的专业代写公司
专注提供稳定可靠的北美、澳洲、英国代写服务
专注提供CS、统计、金融、经济、数学等覆盖100+专业的作业代写服务**