这个作业是用R语言完成网站访问数据的统计分析

STA238 – Winter 2021
Assignment 3 Instructions

第1部分
一家公司有兴趣在其网站上进行一些A / B测试,以提高销售量。为了
为此,他们首先需要查看他们当前的网站使用情况。让Xi代表
在给定小时内访问该网页的客户。
假设X1,X2,…,Xn
iid〜P oisson(λ)。
让我们使用贝叶斯方法来推断λ。使用λ〜Exponential(β)作为先验分布。其中β是平均参数。所以f(λ)= 1
β
Ë
λ≥0时为-λ/β。
第1步(数学证明)
推导λ的后分布。确定后验遵循的众所周知的分布,并
确保确定其参数。注意:后验参数应表示为
样本均值,样本大小,β和数字(不包括)。
第2步(后部的模拟/图形研究)
在这里,您将评论后验分布如何根据先验和数据而变化。
在本节中,假设β是某个固定值(您可以选择任意一个值-
只要合适就行例如0.5或10)。然后假设您收集了一些数据,如果您
收集的数据很少,而收集的数据却很多(即小n与大n)。如果样本均值是多少会发生什么
接近β与远离β?
在这里,您应该创建某种类型的可视化,以评论后部如何受到影响/基于变化
事前,样本量(的变化)和样本均值(的变化)。我会建议类似的地块
参见补充材料11.1.2中的可视化。和评论之间的关系
n和x不同时的后验和先验。您应该至少有2个地块,每个地块都有4个比较。
每个图将包括4条曲线(根据不同的样本均值,先验曲线和后验曲线3条)。与一个情节
小n的4条曲线,大n的另一条曲线。
这是11.1.2的示例。这个想法是产生两个像这样的图(一个代表小n,一个代表大n):
0
1个
2个
3
4
5
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
塞塔
密度
蓝色:优先。紫色:5个头。红色:0个头。橙色:10颗
Beta先验vs后验(Theta),共10次抛硬币
3
请提供一些描述后验和先验之间关系,样本大小和样本的文本
意思是。您的文字应与问题的原始背景相关(即访问网页)。是
确保您知道问题中n和X代表什么。
一般说明(第1部分):
•此问题是一本公开的书,因此您可以使用外部资源(例如,教科书,学术论文,
网站等),以得出后验分布。只要确保您正确记入任何外部
资料来源。
•您可能需要在Rmd文件中使用LaTeX代码。请看一下我们的课程资源
页,以及第4周的同步讲座。
•语法不是评估的主要重点,但重要的是您必须以清晰的方式进行交流。
和专业的态度。即,不应显示任何语或表情符号。
•您可能希望在本节中包含参考书目。如果您(或读者)清楚地看过
提出一些不常识的知识(并且没有被引用),那么您将失去分数。
•使用内联引用。
Part 2
Description:
In this question you will write a report on a data analysis in which your main methodology will be to derive
at least two confidence intervals via bootstrapping. One bootstrap confidence interval should be for a proportion, one bootstrap confidence interval should be for a mean/median. Both bootstrap confidence intervals
should be meaningful/appropriate based on the data. The report will consist of 5 sections: Introduction,
Data, Methods, Results, and Conclusions.
There should be no evidence that Part 2 is an assignment, I should be able to take a screenshot
of this section and paste it into a newspaper/blog. There should be no raw code. All output,
tables, figures, etc. should be nicely formatted.
Feel free to use the same data as you did on Assignment 1 and/or Assignment 2, just make sure that
the data is appropriate for your methods. Pick something that is interesting to investigate and has
variables appropriate for the methodology you are going to perform. NOTE: If your data is not appropriate
in performing two bootstrap CIs (one for a proportion and one for a mean/median) then you will not be
eligible for full marks on this assignment. Please visit office hours, post on Piazza or email our teaching
team at sta238@utoronto.ca for clarification on appropriate data. If you do use the same data as Assignment
1 and/or 2 there is enough variation in the methodology that you should need to amend your graphs
and text. Thus, you should NOT directly copy your previous assignment work. You may include an
amended/proofread/updated version of previous work, but it should not be a direct copy of a previous
submission. If your work is a direct copy of a previous submission this is considered an academic offense.
Introduction
The goal of the Introduction section is to introduce the overall “problem” to the reader.
Your Introduction section should include the following:
• Describe the data and the problem in 2-3 clear sentences.
• Should introduce the importance of the analysis.
• Get the reader interested/excited about analysis.
• Provide some background/context explaining the global relevance of the problem/data/analysis.
• Introduce terminology and prep the reader for the following sections.
• Introduce hypotheses.
Data
The goal of the Data section is to introduce the reader to the data set, showcase some meaningful aspects
of the data, and get them thinking about potential hypotheses/findings.
Your Data section should include the following:
• A description of the data collection process.
5
• A summary of the cleaning process (if you cleaned the data).
• A description of the important variables.
• Some appropriate numerical summaries (at minimum center and spread, but something else may be
more appropriate). If there are a lot, please put them in a well formatted and labelled table.
• At least 1 aesthetically pleasing plot/graph/figure (No more than 4 plots).
• Text explaining/highlighting each table or figure.
• Some text (and perhaps graphical summaries) of the variables you will perform the bootstrap on (don’t
do the bootstrap here – just prep the reader for what is coming in later sections). This should help
prep the reader in understanding why the CI is important/interesting and whether it is appropriate.
• In line referencing/text if needed.
• Reference the programming language/software used to complete this section.