这是一篇来自英国的关于数据科学和统计建模任务导论的作业代写

 

therefore approaches, solutions shouldn’t be discussed with other students. Plagiarism and  collusion with other students are examples of academic misconduct and will be reported. More  information on academic honesty can be found here.

  1. The colour of the human eye is determined by a pair of genes. If both of these genes code the colour blue, then the given person will have blue eyes. If at least one of the genes codes the colour brown, then the person will have brown eyes. That is, if we denote by ‘A’ the gene coding the colour brown, and by ‘a’ the gene coding the colour blue, then we have the following Gene Eye colour

AA Brown

Aa Brown

aA Brown

aa Blue

A child inherits one gene from each of their parents. That is one gene is chosen randomly (with equal probability) from the gene-pair of their father, and one gene is chosen randomly (with equal probability) from the gene-pair of their mother. Below are two examples, where the entries of the tables show the possible gene-pairs of the children. Note that each of these gene-pairs has equal probability.

Example 1:

Father’s genes

A a Mother’s A

AA Aa genes a

aA aa Example 2: Father’s genes

A A Mother’s A

AA AA genes a aA aA

Assume that Aaron and both of his parents have brown eyes, but Aaron’s sister has blue eyes.

(a) [3 marks] What is the probability that Aaron has a blue eye gene?

(b) [6 marks] Assume that Aaron’s wife has blue eyes. What is the probability that their first child will have blue eyes?

(c) [10 marks] Suppose that Aaron and his wife’s first child ended up having brown eyes (and not blue). How does this information change the probability that Aaron has a blue eye gene? What is the probability that their second child will have brown eyes too?

  1. Assume that a new Conservative Party leadership election has been triggered in the UK at a time when there are 361 conservative MPs in the parliament. Two of these MPs, M and B, join the leadership contest, where the aim is to get the majority support of the remaining 359 conservative MPs.

We further assume that on the day the leadership contest is announced 184 of these MPs support M, and the remaining 175 MPs support B in becoming the next party leader. The announcement is followed by an election campaign, during which MPs can decide to change their allegiance. In particular, we know that on any given day, there is a probability of 0.005 that an MP who has been supporting M will become a B supporter by the end of the day, while the probability that an MP who has been supporting B will become an M supporter by the end of the day is 0.004. Each MP makes their decision independently of each other, and independently of the decision they made the day before.

(a) [4 marks] Introduce the following random variables:

Xi (1) = ( 1B supporter number i still supports B at the end of day 1, 0, B supporter numberi changes to an M supporter at the end of day 1, for i = 1, . . . , 175; and Xi (2) = ( 1, M supporter numberchanges to a B supporter at the end of day 1 , 0, M supporter numberi still supports M at the end of day 1, for i = 1, . . . , 184.

Using these random variables express the number of B supporters at the end of the first day, then use your formula to find the expected number of B supporters at the end of the first day. Justify every step of your argument.

(b) [3 marks] Define random variables ˆX

(1)i , i = 1, . . . , 175 andˆX

(2)i , i = 1, . . . , 184 whose sum gives you the number of M supporters at the end of the first day. What is the expected number of M supporters at the end of the first day?

(c) [6 marks] R: The election campaign is set to last for 2 weeks. This means that each MP would vote according to the allegiance they have at the end of day 14, that is, the candidate they would vote for is the one they are supporting after the first 14 days of the campaign. Using simulation find the probability that in this election B would hold the majority of the votes among the 359 MPs.

(d) [3 marks] R: Now suppose that the election had to be postponed, and with the new date, candidates now have a 60 day long campaign period (as opposed to 14 days). Adjust your code from part 2c to find the probability that B will win the delayed election. How does this probability compare to the one computed in part 2c?

  1. Observations Y1, Y2, . . . , Yn are assumed to be independent and identically distributed samples from a data model following a Rayleigh distribution, with probability density function:

f(y; θ) =ye y 2/2θ θ for θ > 0 and 0 < y < .

The mean of this distribution is µ = r πθ 2, and the variance isσ 2 = θ(4 π)2.

(Note that here π is not a parameter, it is the usual mathematical constant i.e. 3.14...)

(a) [2 marks] Find the method of moments estimator θ ˜ of θ.

(b) [5 marks] Is your estimator θ ˜ unbiased? If not, then suggest an adjustment to this estimator that would make it unbiased and report your final unbiased estimator. Hint: If E(θ ˜) = , then the  estimator 1 c θ ˜ is unbiased. Also, remember that we can express second moments using the formula  of the variance.

(c) [4 marks] An alternative estimator is θ ˆ = 2 1 n P n i=1 Yi 2 . Is this estimator unbiased? If not, suggest an adjustment that makes it unbiased. See hints given in part 3b.

(d) [5 marks] Using the fact that the random variable X = Y 2 is exponentially distributed with rate12θ , assess whether the estimator θ ˆ from part 3c is consistent.

(e) [6 marks] We have 150 samples from a Rayleigh distribution with sample mean 3.2. Using an appropriate point estimator of θ, suggest a suitable estimate of the variance, and use this variance estimate to construct an approximate 95% confidence interval for the mean of the distribution.

(You can use R to find the relevant quantiles).

  1. Consider the data set Y1, Y2, . . . , Yn that is assumed to have arisen from the data model with probability density function

f(y; θ) = (k(1 y)y θ+1 , 0 < y < 1, 0, otherwise, where θ > 0.

(a) [4 marks] Find the constant k that makes the above function a probability density function.

(b) [6 marks] Show that the maximum likelihood estimator, θ ˆ of θ is given by the solution to the equation:

θ ˆ2nXi=1log(Yi) + θ ˆ ” 5

nXi=1

log(Yi) + 2n # + 6

nXi=1log(Yi) + 5n = 0.

(c) [5 marks] R: Let y1, . . . , y30 below correspond to 30 samples of this distribution

0.573 0.770 0.652 0.827 0.821 0.789

0.898 0.718 0.382 0.668 0.647 0.477

0.661 0.380 0.870 0.794 0.783 0.732

0.629 0.777 0.600 0.724 0.553 0.693

0.687 0.935 0.494 0.411 0.530 0.478

To produce a maximum likelihood estimate for θ based on these data, use the polyroot function of R.

Hint: Polyroot finds the roots of a polynomial. Its argument is the vector of polynomial coefficients  in increasing order. For example, to find the roots of the polynomial p(x) = x 2 + 2x 3 we can  use

rt <- polyroot(c(-3,2,1))

Even though both roots that you will get are real, polyroot gives these roots in complex form  (don’t worry about what this means). You can use the Re() function to extract the real part of  complex numbers. That is if the outcome of the polyroot function is stored in the variable rtthen we can use the following to get the desired roots.

rt_real <- Re(rt)

rt_real

## [1] 1 -3

Note that this code lists all the roots of a polynomial. You will have to check which one of these is a local maximum.