本次英国代写主要为统计相关的coursework

MATH4068: Coursework 2021

课程作业
该文件gap.csv在Moodle上可用,包含人均GDP和142的预期寿命
从1952年到2007年是不同的国家/地区。此数据来自gapminder.org。
使用命令将数据加载到R中
gap.raw <-read.csv(’gap.csv’)
差距<-gap.raw
间隔[,3:14] <-log(gap.raw [,3:14])
请注意,对于人均GDP,最好在进行统计分析时使用log(GDP)作为值
各国之间的变化幅度超过几个数量级。为了便于绘制,将
数据分为两个数据框,一个包含人均GDP,另一个包含预期寿命数据。
gdp <-exp(gap [,3:14])
年<-seq(1952,2007,5)
姓氏(gdp)<-年
行名(gdp)<-间隙[,2]
lifeExp <-差距[,15:26]
colnames(lifeExp)<-年
行名(lifeExp)<-差距[,2]
在这个项目中,您将使用我们在模块中介绍的方法来分析这些数据。
•首先创建一些基本的探索性数据分析图,以显示GDP和预期寿命
在过去的70年中发生了变化。
主成分分析
•使用log进行log(GDP)数据和寿命预期数据的主成分分析
您首选的S或R。
•计算每个主要成分所解释的变化比例,并提供
碎石图。讨论在每种情况下您将选择保留多少个主要组成部分。
•查看log(GDP)和预期寿命数据的主要组成部分,并提供
您选择保留的每个组件的解释。
•提供前三个主要成分得分的组合的散点图,在
绘制国家名称。按数据所属的大陆为数据点上色。识别并
根据您的分析讨论任何具有有趣特征的国家。你能解释一下
这些国家中发生了什么?
多维缩放
•使用log(GDP)和预期寿命的组合数据集进行多维缩放,即
使用
差距[,3 26]

查找并绘制数据的二维表示。和以前一样,按大陆为每个数据点上色
它开着。讨论此图与以前的图的相似性。

Coursework
The file gap.csv is available on Moodle, and contains the GDP per capita, and the life expectancy for 142
different countries from 1952 to 2007. This data is from gapminder.org.
Load the data into R using the commands
gap.raw <- read.csv(‘gap.csv’)
gap <- gap.raw
gap[,3:14]<- log(gap.raw[,3:14])
Note that for GDP per capita, it is best to work with log(GDP) when doing statistical analysis, as the values
vary over several orders of magnitude between countries. For ease of plotting, it may be useful to split the
data into two data frames, one containing GDP per capita, and the other life expectancy data.
gdp <- exp(gap[,3:14])
years <- seq(1952, 2007,5)
colnames(gdp) <- years
rownames(gdp) <- gap[,2]
lifeExp <- gap[,15:26]
colnames(lifeExp) <- years
rownames(lifeExp) <- gap[,2]
In this project, you will analyse this data using the methods we have looked at during the module.
• Begin by creating some basic exploratory data analysis plots, showing how GDP and life expectancy
have changed over the past 70 years.
Principal component analysis
• Carry out principal component analysis on the log(GDP) data and on the life-expectancy data using
your preferred choice of S or R.
• Calculate the proportion of variation explained by each of the principal components, and provide a
scree plot. Discuss how many principal components you would choose to retain in each case.
• Look at the leading principal components for the log(GDP) and the life expectancy data, and provide
an interpretation for each component you have chosen to retain.
• Provide scatter plots of combinations of the first three principal component scores, indicating on the
plot the names of the countries. Colour the data points by the continent they belong to. Identify and
discuss any countries that have interesting characteristics based on your analysis. Can you explain what
happened in any of these countries?
Multidimensional scaling
• Perform multidimensional scaling using the combined dataset of log(GDP) and life expectancy, i.e.,
using
gap[,3 26]

Find and plot a 2-dimensional representation of the data. As before, colour each data point by the continent
it is on. Discuss the similarity of this plot with your previous plots.