这个作业是完成货币、贸易相关的Probability Theory统计问题

STAT 131: Take-Home Test 2

1. [70总分](“ Exchange悖论”)您在与以下货币游戏对战
对手,也有裁判参加;想象与两个朋友玩这个游戏,
这将为合理地参与游戏的金额提供背景
它。裁判有两个信封(出于这个问题,编号分别为1和2),但是当
玩游戏时,信封上没有标记)和(没有您或您的对手看到
她所做的)她(裁判)将$ m放入信封1,将$ 2m放入信封2,持续m> 0(让我们
在实践中将m视为连续的,尽管实际上它会被四舍五入,
最接近的美元)。您和您的对手各自随机获得一个信封。你打开你的
偷偷地信封并找到$ x(您的对手也在信封中偷偷看),然后裁判
然后询问您是否要与对手交易信封。您认为如果您进行交易,
将获得$ x
2
或$ 2 x,每个概率为1
2
。这使得金额的期望值
如果您的交易额等于
1个
2
$ x
2
</ s> </ s> </ s>
+

1个
2
</ s> </ s> </ s>
($ 2 x)= $ 5x
4
,大于$ x
您目前拥有,因此您愿意进行交易。矛盾的是你的对手有能力
完全相同的计算。贸易如何对你们俩都有利? (不可能)
这个问题的关键是要证明上述推理是贝叶斯算法的缺陷
观点看法;交易信封总是最优的结论基于以下假设
观察得到的信封中没有任何信息,这
当您以贝叶斯方式进行推理时,可以认为该假设是错误的。在片刻之前
游戏开始时,令p(m)为您的(连续)事前分配数(PDF)
M裁判将放置在信封1中,并让X为您在信封中找到的金额
当您打开游戏时(实际上是在玩游戏时,观察到的x当然是
可以用来减少关于M的不确定性。
(a)[20分]解释为什么这个问题的出现意味着P(X = m | M = m)= P(X =
2 m | M = m)= 1
2
,并以此来表明
P(M = x | X = x)= p(x)
p(x)+ p

X
2
和P
</ s> </ s> </ s>
M =
X
2

X = x
</ s> </ s> </ s>
=
p

X
2
</ s> </ s> </ s>
p(x)+ p

X
2
。 (1)
由此证明,对手的货币金额Y的期望值
如果您在打开的信封中发现了$ x,则该信封为
E(Y | X = x)= p(x)
p(x)+ p

X
2
(2 x)+ p

X
2
</ s> </ s> </ s>
p(x)+ p

X
2
</ s> </ s> </ s>
X
2
</ s> </ s> </ s>
。 (2)
(b)[40分]假设在这个游戏中,金钱和效用重合(或至少假设
该效用对您来说是线性的,斜率为正)。使用贝叶斯决策理论
最大化预期效用的原理,以表明您应该只提供交易信封
如果
p
X
2
</ s> </ s> </ s>
<2 p(x
如果您和两个朋友(其中一个担任裁判)要与
真实的钱,可能是少量的钱更多了
可能由裁判员选择而不是大量选择,这使得探索条件变得有趣
(3) for prior distributions that are decreasing (that is, for which p(m2) < p(m1) for m2 > m1). Make
a sketch of what condition (3) implies for a decreasing p. One possible example of a continuous
decreasing family of priors on M is the exponential distribution indexed by the parameter λ, which
represents the reciprocal of the mean of the distribution. Identify the set of conditions in this family
of priors, as a function of x and λ, under which it’s optimal for you to trade. Does the inequality
you obtain in this way make good intuitive sense (in terms of both x and λ)? Explain briefly.
(c) [10 points] Looking carefully at the correct argument in paragraph 2 of this problem, identify the
precise point at which the argument in the first paragraph breaks down, and specify what someone
who believes the argument in paragraph 1 is implicitly assuming about the prior distribution p(m).
2. [210 total points] (practice with joint, marginal and conditional densities) This is a toy problem
designed to give you practice in working with a number of the concepts we’ve examined; in a course
like this, every now and then you have to stop looking at real-world problems and just work on
technique (it’s similar to classical musicians needing to practice scales, in addition to actual pieces
of symphonic or chamber music).
Suppose that the continuous random vector X = (X1, X2) has PDF given by
fX(x) = 
4 x1 x2 for 0 < x1 < 1, 0 < x2 < 1
0 otherwise 
(4)
in which x = (x1, x2), and define the random vector Y = (Y1, Y2) with the transformation (Y1 =
X1, Y2 = X1 X2).
(a) Are X1 and X2 independent? Present any relevant calculations to support your answer. [10
points]
(b) Either work out the correlation ρ(X1, X2) between X1 and X2 or explain why no calculation
is necessary in correctly identifying the value of ρ. [10 points]
(c) Sketch the set S of possible X values and the image T of S under the transformation from
X to Y , and show that the joint distribution of Y = (Y1, Y2) is
fY (y) = 
4
y2
y1
for 0 < y1 < 1, 0 < y2 < y1 < 1
0 otherwise 
, (5)
in which y = (y1, y2). Verify your calculation by demonstrating that RR
T
fY (y) dy = 1. [50
points]
(d) Work out
(i) the marginal distributions for Y1 and Y2, sketching both distributions and checking that
they both integrate to 1;
(ii) the conditional distributions fY1 | Y2
(y1 | y2) and fY2 | Y1
(y2 | y1), checking that they each
integrate to 1; and
(iii) the conditional expectations E(Y1 | Y2) and E(Y2 | Y1); and
2
(iv) the conditional variances V (Y1 | Y2) and V (Y2 | Y1). (Hint: recall that the variance of a
random variable W is just E (W2
) − [E(W)]2
.)
[120 points]
(e) Are Y1 and Y2 independent? Present any relevant calculations to support your answer. [10
points]
(f) Either work out the correlation ρ(Y1, Y2) between Y1 and Y2 or explain why no calculation is
necessary in correctly identifying the value of ρ. [10 points]
3. [100 total points] (moment-generating functions) Distributions may in general be skewed, but
there may be conditions on their parameters that make the skewness get smaller or even disappear.
This problem uses moment-generating functions (MGFs) to explore that idea for two important
discrete distributions, the Binomial and the Poisson.
(a) We saw in class that if X ∼ Binomial(n, p), for 0 < p < 1 and integer n ≥ 1, then the MGF
of X is given by
ψX(t) =
p et + (1 − p)
n
. (6)
for all real t, and we used this to work out the first three moments of X (note that the
expression for E (X3
) is only correct for n ≥ 3):
E(X) = n p , E
X
2

= n p[(1 + (n − 1)p] , (7)
E

X
3

= n p[1 + (n − 2)(n − 1)p
2 + 3 (n − 1)p] , (8)
from which we also found that V (X) = n p(1 − p). Show that the above facts imply that
skewness(X) = 1 − 2 p
p
n p(1 − p)
. (9)
Under what condition on p, if any, does the skewness vanish? Under what condition on n, if
any, does the skewness tend to 0? Explain briefly. [30 points]
(b) In our brief discussion of stochastic processes we encountered the Poisson distribution: if
Y ∼ Poisson(λ), for λ > 0, then the PMF of Y is
fY (y) =  λ
y e−λ
y!
for y = 0, 1, . . .
0 otherwise 
. (10)
(i) Use this to show that for all real t the MGF of Y is
ψY (t) = e
λ(e
t−1)
. (11)
[10 points]
(ii) Use ψY (t) to compute the first three moments of Y , the variance of Y and the skewness
of Y . Under what condition on λ, if any, does the skewness either disappear or tend to
0? Explain briefly. [60 points]
4. [140 total points] (archaeology) Paleobotanists estimate the moment in the remote past when a
given species became extinct by taking cylindrical, vertical core samples well below the earth’s surface and looking for the last occurrence of the species in the fossil record, measured in meters above
the point P at which the species was known to have first emerged. Letting {yi
, i = 1, . . . , n} denote a
sample of such distances above P at a random set of locations, the model (Yi
|θ)
IID∼ Uniform(0, θ) (∗)
emerges from simple and plausible assumptions. In this model the unknown θ > 0 can be used,
through carbon dating, to estimate the species extinction time.
The marginal distribution of a single observation yi
in this model may be written
pYi
(yi
| θ) =  1
θ
if 0 ≤ yi ≤ θ
0 otherwise 
=
1
θ
I (0 ≤ yi ≤ θ) , (12)
where I(A) = 1 if A is true and 0 otherwise.
(a) Briefly explain why the statement {0 ≤ yi ≤ θ for all i = 1, . . . , n} is equivalent to the
statement {m = max (y1, . . . yn) ≤ θ}, and use this to show that the joint distribution of
Y = (Y1, . . . , Yn) (given θ) in this model is
fY1,…,Yn
(y1, . . . , yn | θ) = I(m ≤ θ)
θ
n
. (13)
[20 points]
(b) Letting the observed values of (Y1, . . . , Yn) be y = (y1, . . . , yn), an important object in both
frequentist and Bayesian inferential statistics is the likelihood function `(θ | y), which is obtained from the joint distribution of (Y1, . . . , Yn) (given θ) simply by
(1) thinking of fY1,…,Yn
(y1, . . . , yn | θ) as a function of θ for fixed y, and
(2) multiplying by an arbitrary positive constant c:
`(θ | y) = c fY (y | θ). (14)
Using this terminology, in part (a) you showed that the likelihood function in this problem
is `(θ | y) = θ
−n
I(θ ≥ m), where m is the largest of the yi values. Both frequentists and
Bayesians are interested in something called the maximum likelihood estimator (MLE) ˆθMLE,
which is the value of θ that makes `(θ | y) as large as possible.
(i) Make a rough sketch of the likelihood function, and use your sketch to show that the
MLE in this problem is ˆθMLE = m = max (y1, . . . yn). [20 points]
(ii) Maximization of a function is usually accomplished by setting its first derivative to 0 and
solving the resulting equation. Briefly explain why that method won’t work in finding
the MLE in this case. [10 points]
(c) A positive quantity W follows the Pareto distribution (written W ∼ Pareto(α, β)) if, for
parameters α, β > 0, it has density
fW (w) = 
α βα w
−(α+1) if w ≥ β
0 otherwise 
. (15)
This distribution has mean αβ
α−1
(if α > 1) and variance αβ2
(α−1)2(α−2) (if α > 2).