这个Mathematical statistics数理统计的作业是用R语言对上世纪荷兰地方的历史温度进行分析
EBC2107 Mathematical Statistics Assignment
Analysis of Trends in Temperatures
1引言
对于此作业,您将分析是否有证据表明温度上升趋势
记录了上个世纪在荷兰的几个地方。
为此,您必须实施在课程中学习的统计技术
用编程语言R进行编程,并将其应用于历史温度分析
荷兰多个地点的数据。
教程。
2数据
在课程页面上,您可以找到一个Excel文件和.csv(逗号分隔值)文件,
包含荷兰三个地区的平均温度:De Bilt,Eelde和
马斯特里赫特。数据提供1907年至2019年的数据。数据提供了三个方面的数据
频率:每天,每月和每年。
每日数据这些数据代表三个位置记录的平均每日温度
从1907年1月1日到2019年12月31日。由于这些数据比较嘈杂,因此太大
直接处理并受季节变化的影响,您不必使用它们。他们是
简单地提供它们,因为它们构成了其他数据系列的基础。
每月数据通过获取所有每日温度的平均值获得数据
一个月内。由于许多平均噪音已通过平均滤除,因此这些数据
更容易处理。但是,季节性模式仍然会影响数据。显然,温度
一月份将与七月份有所不同。这意味着发现趋势仍然很复杂。
因此,您也不必使用这些数据。
平滑的每月数据为了能够使用每月数据,它们必须
“季节性调整”以消除季节性模式。正在执行此操作(不是
必然是最好的方法)是“平滑”数据,这意味着每个月
1个
花费该月前后月份的(加权)平均值。这里我们采用线性平滑器
也就是说,我们对半年内所有月份的所有月份取均等权重的平均值。
正式地,如果我们将Yi设为第i个月的温度,则将
第i个月,表示为Y
s
一世
, 是(谁)给的
ÿ
s
我=
1个
24
Yi-6 +
1个
12
X
5
j = -5
Yi + j +
1个
24
易+6。
图1并排绘制了原始数据和平滑后的每月数据。注意-巨大
-差异。
日期
°C
1920 1940 1960 1980 2000 2020
−5
0
5 10 15 20
德·伯特
埃尔德
马斯特里赫特
(a)原始每月温度
日期
°C
1920 1940 1960 1980 2000 2020
6
7
8
9 10 11 12
德·伯特
埃尔德
马斯特里赫特
(b)每月气温平稳
图1:1907-2019年荷兰每月温度
年度数据年度数据只是简单地计算为一天中所有天的平均值
一年。由于这些数据不再包含任何季节性模式,因此可以直接
用于趋势分析。这些数据构成了作业的主要输入。数字
2绘制年度数据。
3 Programming in R
For the assignment you have to programme the techniques we learn in the course in the
statistical software package R. R is available for (free) download on www.r-project.org. More
information about R can be found on the course page.
4 Assignment
For your assignment you write a paper where you should try to answer the question if there
is statistical evidence of an upward trend in the temperature series. The main focus should
2
Date
° C
1920 1940 1960 1980 2000 2020
7
8
9 10 11
De.Bilt
Eelde
Maastricht
Figure 2: Annual Dutch temperatures 1907-2019
be on the annual data. Choose one series as your main series of interest, but check if your
conclusions change depending on which series you use.
To guide you in the analysis, below you can find a list with specific questions to consider
in your analysis. Remember though that in the end you should provide one coherent analysis
in the paper, and not a point by point answering of the questions.
Compare average temperatures in different parts of the sample
Start by analyzing average temperatures over different parts of the sample. Split the sample
in a number of subsamples, and compare average temperatures across the subsamples. You
can vary the way how to split your sample. Make sure estimation uncertainty is taken into
account, e.g. by constructing confidence intervals. You could also consider overlapping versus
non-overlapping subsamples.
You can also consider a formal test for equality in different subsamples, for example you
could split the sample in two and test whether the mean temperatures in both parts are equal.
3
Investigate the presence of a linear upward trend
Next we fit a linear regression model to the data. That is, if Y1, . . . , Yn are the temperature
data, we fit the regression model
Yi = α + βxi + εi
, (1)
and take x1, . . . , xn to represent a linear trend. Estimate the model, and provide measures of
estimation uncertainty such as confidence intervals. Also perform a hypothesis test to test if
an upward trend is visible in the data.
Investigate the presence of a linear upward trend in part of the sample
While we have data from 1907 on, some people claim temperatures only started to rise
significantly from the seventies on. Here we investigate that claim. We still consider model
(1), but now we take x1, . . . , xn such that the linear trend only kicks in from a starting point
late in the sample on. Find a reasonable point to start the trend at (e.g. somewhere in the
seventies) and motivate your choice. Motivation can come from either outside sources, or
from the data. In case you motivate the choice from the data, explain how this could be
misleading.
Analyse the model in the same way as for the overall linear trend and contrast your
findings with that case.
Implement the bootstrap
For the most interesting parts of your analyse above, construct the relevant hypothesis tests
and confidence intervals using the bootstrap rather than the standard approach. Focus especially on implementing the bootstrap for the regression model. Discuss how this changes
your results and how to interpret any changes.
Discuss the assumptions you need in the analysis
For all parts of your analysis, discuss carefully which assumptions you needed to make. For
example, do you need to assume normality? When do you need to assume independent and/or
identically distributed random variables? Can you give an asymptotic justification of your
methods that avoids some of the assumptions?
Once you set up the assumptions you need, discuss how likely it is that they are satisfied
for your data. You can also think about ways to check if your assumptions are satisfied, either
formally or informally. For example, can we check if normality, or independence, are satisfied
by the data?
Extension to monthly data
So far you could use just the annual data for the analysis. A final issue that you could
consider is how using monthly data changes the picture. A straightforward extension is just
to apply the same techniques to the smoothed monthly data. You can take some of the most
interesting aspects of your previous analysis and repeat them using those data. If you do
so, carefully discuss how much added benefit it is to consider these smoothed monthly data
relative to the annual data, especially in light of the assumptions made.
4
Another option would be to use the monthly data in a different way. For example, you
could think of alternative ways to remove the seasonal effect. Alternatively, you could use the
information in the monthly data differently; rather than looking at overall trends you could
consider just summer or winter, or maybe look at the variation in temperatures within a year.
Turn your analysis into a paper and conclude
Having performed all your analyses, you need to draw meaningful conclusions from it. Make
sure your paper is coherent – writing a paper is more than just ticking boxes and doing
exercises. In an academic paper you tell a story that has a logical flow from start to end.
Formulate research questions in the introduction, address these in the analyses, and provide
answers to the questions in the conclusion.
An important aspect of the story writing is to translate your statistical findings into
societal meaningful conclusions. What is the meaning of your findings? Are there limitations
to the methods used, or assumptions they require, that might affect how you draw conclusions?
Try to address these issues in your paper.