本次R语言代写的主要内容是一个数据分析的group Project

Instructions

STA238 – Winter 2021

从赋值2回忆起,对于n-1 i = 1,定义为S2 = 1 n(Xi-X̄)2的样本方差是无偏的

σ2。然后我们回答“为什么我们要除以n − 1而不是n?”。现在,让我们更进一步,研究分配2第1部分中的任一估计量是否为σ2的最大似然估计量。

第1步(数学证明)

iid2 2假设X1,…,Xn〜Normal(μ= 0,σ),得出σ的最大似然估计。

注意:在此推导中,您应该:1.明确标识似然函数(以简化形式); 2.明确标识对数似然函数(以简化形式); 3.使用二阶导数检验以确保估计量确实为最大值; 4.确保您对σ2(而不是σ)有所区别。

步骤2(模拟理由):

比较在两个估计器1n处评估时σ2的似然(或对数似然)

和。

T 1 = S 2 =(X i − X̄)2 n − 1 i = 1

1n

T2 = S ∗ 2 =(Xi-X̄)2

ñ

i = 1

在这里,您将模拟n个Normal(0,σ2)随机变量来表示您的数据。选择(至少)10个不同的样本量。对于每个模拟样本(即,每个n),评估T1和T2处的似然度,并为不同的n绘制这两个似然比。所以n在x轴上,似然比在y轴上。因此,一个图是针对一个σ2和10个(或更多)不同样本大小的。对另一个σ2重复此操作(我建议为σ2选择2个不同的值,一个为“大”,一个为“小”)。您可以将两条线放在一个图上(确保清楚哪个图是针对哪个σ2的),也可以将每条线放在自己的图上。

创建这些图后,请提供一些注释,说明您的模拟结果是否与步骤1中的推导一致。

这是用于似然性比较的线图之一的示例。本示例适用于参数(平均值)θ为指数的数据。以样本中位数和样本均值(MLE)评估可能性。产生此代码的代码在Assignment4.Rmd中。

一般说明(第1部分):

该问题是一本公开的书,因此您可以使用外部资源(例如,教科书,学术论文,可信的网站等),尤其是在第1步中,以证明/显示MLE推导。只要确保您适当地相信任何外部来源即可。

您可能需要在Rmd文件中使用LaTeX代码。请查看我们的课程资源页面,以及第4周的同步讲座。

语法不是评估的主要重点,但以清晰且专业的方式进行交流很重要。即,不应显示任何语或表情符号。

您可能希望在本节中包含参考书目。如果很明显您(或读者)查找的东西不是常识(并且没有被引用),那么您将失去分数。

使用内联引用

第2部分

描述:

在这个问题中,您将撰写一份有关数据分析的报告,其中您的主要方法将是得出至少两个置信区间。一个置信区间应该是平均值,并且应该通过临界值来计算(即不是通过自举)。另一个置信区间应用于其他度量,而不是要通过自举计算的平均值,中位数或比例(例如,百分位数,比率,方差,标准差等)。两个置信区间应基于数据有意义/适当。该报告将包括5个部分:简介,数据,方法,结果和结论。

没有证据表明第2部分是作业,我应该可以对此部分进行截图,然后将其粘贴到报纸/博客中。不应有原始代码。所有输出,表格,图形等都应正确格式化。

Part 1

Description

Recall from Assignment 2 that the sample variance defined as: S2 = 1 􏰁n (Xi X ̄)2 is unbiased for n1 i=1

σ2. And we answered “Why are we dividing by n 1 and not n?”. Now let’s take this one step further and investigate if either of the estimators in Assignment 2 Part 1 is a maximum likelihood estimator of σ2.

Step 1 (Mathematical Justification)

iid2 2 Assume that X1,…,Xn Normal(μ = 0), derive the maximum likelihood estimator of σ .

NOTE: In this derivation you should: 1. explicitly identify the likelihood function (in a simplified form); 2. explicitly identify the loglikelihood function (in a simplified form); 3. use the second derivative test to ensure that the estimator is indeed a maximum; and 4. make sure you are differentiating with respect to σ2 (and not with respect to σ).

Step 2 (Simulation Justification):

Compare the likelihood (or loglikelihood) of σ2 when evaluated at the two estimators 1n

and .

T 1 = S 2 = 􏰂 ( X i X ̄ ) 2 n 1 i=1

1n

T2=S2= 􏰂(XiX ̄)2

n

i=1

Here you will simulate n Normal(02) random variables to represent your data. Select (at least) 10 different sample sizes. For each simulated sample (i.e, for each n) evaluate the likelihood at T1 and T2 and plot the ratio of these two likelihoods for the different n. So n is on the x-axis, and the ratio of the likelihoods is on the y-axis. So one plot is for one σ2 and the 10 (or more) different sample sizes. Repeat this for another σ2 (I would recommend choosing 2 different values for σ2 one “large” and one “small”). You can put both lines on one plot (make sure it’s clear which plot is for which σ2) or put each line on it’s own plot.

Once these plots are created provide some commentary on whether your simulation results are inline with your derivations in Step 1.

Here is an example of one of the lines-plots for the comparison of the likelihood. This example is for data that is exponential with parameter (mean) θ. The likelihood is evaluated at the sample median and sample mean (the MLE). The code to produce this is in the Assignment4.Rmd.

General Notes (for Part 1):

  • This question is open book, so you can use outside sources (i.e., textbooks, academic papers, credible websites, etc.), especially in Step 1, to prove/show the MLE derivation. Just make sure you properly credit any outside sources.
  • You will likely need to use LaTeX code in your Rmd file. Please have a look at our course Resources page, as well as the synchronous lecture in Week 4.
  • Grammar is not the main focus of the assessment, but it is important that you communicate in a clear and professional manner. I.e., no slang or emojis should appear.
  • You may want to include a bibliography in this section. If it is clear that you (or the reader) looked up something that is not common knowledge (and it was not cited) then you will lose points.
  • Use inline referencing

Part 2

Description:

In this question you will write a report on a data analysis in which your main methodology will be to derive at least two confidence intervals. One confidence interval should be for a mean and should be calculated via critical values (i.e., NOT via bootstrapping). The other confidence interval should be for another measure, that is not the mean, median, or a proportion (e.g., a percentile, ratio, variance, standard deviation, etc.) to be calculated via bootstrapping. Both confidence intervals should be meaningful/appropriate based on the data. The report will consist of 5 sections: Introduction, Data, Methods, Results, and Conclusions.

There should be no evidence that Part 2 is an assignment, I should be able to take a screenshot of this section and paste it into a newspaper/blog. There should be no raw code. All output, tables, figures, etc. should be nicely formatted.


EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

E-mail: easydue@outlook.com  微信:easydue


EasyDue™是一个服务全球中国留学生的专业代写公司
专注提供稳定可靠的北美、澳洲、英国代写服务
专注提供CS、统计、金融、经济、数学等覆盖100+专业的作业代写服务