这个作业是用SAS将已有的数据绘制成图表

MATH 1309 MULTIVARIATE ANALYSIS
Assignment 2

问题1(40分)
文件THC.csv公司包含大麻中13种不同化合物浓度的数据
哥伦比亚同一地区的植物来自三种不同的品种。
1计算中13种化学浓度的平均值和标准偏差
通过SAS采样THC数据(1分)
2在SAS中生成相关矩阵和散射图。矩阵适用于相关
主成分分析(2分)
三。使用SAS对原始数据进行主成分分析,并评估有多少个PC
需要保留。根据结果输出回答以下问题
(10分,以下各2分)
a) 第一次、第二次和第二次变异占总样本变异的百分比是多少
第三台电脑?
b) 解释前3个PC。
c) 写出第一、第二和第三个PCs作为原始变量的线性函数。
d) 数据能否有效地归纳为少于13个维度?证明你的
回答。
e) 通过SAS获取或绘制scree图,以确认您对PC数量的选择。
4使用SAS对相关矩阵进行主成分分析。回答
来自结果SAS输出的以下问题。
a) 第一、第二和第三种变异占总样本变异的百分比是多少
个人计算机?(1.5分)
b) 解释前3名(1.5分)
c) 写出第一、第二和第三个PC作为标准化的线性函数
变量。(1.5分)
d) 数据能否有效地归纳为少于13个维度?证明你的
回答评论。(4.5分)
e) 通过SAS获取或绘制岩屑图,以确认您对数量的选择。(1分)
f) 对于您在第d部分中选择的缩小尺寸的数量,获得剖面图和
仔细解释。(4分)
g) 获取前4个朊病毒(PCs)的模式图并仔细解释。(4分)
h) 得到pc2与PC1的得分图。是否有任何可识别的异常值或模式
可以评论吗?(1分)
i) 利用协方差矩阵得到pc2与PC1的得分图。有什么可辨认的吗
你可以评论的异常值或模式?(1分)
j) 利用相关矩阵得到pc2与PC1的得分图。有什么可辨认的吗
你可以评论的异常值或模式?(1分)
k) 用标准化的pc2和PC1得到pc2和PC1的得分图。有什么可辨别的吗
你可以评论的异常值或模式?(1分)
l) 求第一到第五特征值的95%置信区间。(5分)
2
问题2(20分)
考虑下面的原始数据集,其中有12个观察值,涉及5个社会经济变量,称为
人口、学校、就业、服务和房屋价值。
数据社会经济学;
投入人口;学校就业服务;房屋价值;
数据线;
5700 12.8 2500 270 25000
1000 10.9 600 10万
3400 8.8 1000 10 9000
3800 13.6 1700 140 25000
4000 12.8 1600 140 25000
8200 8.3 2600 60 12000
1200 11.4 400 10 16000
9100 11.5 3300 60 14000
9900 12.5 3400 180 18000
9600 13.7 3600 390 25000
9600 9.6 3300 80 12000
9400 11.4 4000 100 13000
;
过程因子数据=社会经济学简单校正;
运行;
使用上面的SAS语句进行因子分析。
显示您的SAS代码(它可以根据我的建议有所不同)、输出和答案
作业pdf或docx,你提交画布。
1. Prepare the dataset for a Factor analysis via SAS. (1 mark)
2. Generate the means and standard deviations of the data. (1 mark)
3. Perform a Factor analysis on the raw data AND the correlation matrix using the code above
and answer the following questions. (6 marks)
4. From the eigenvalues of the correlation matrix and the factor loading matrix and
communalities outputted answer the following questions.
a) Do the first two principal components (factors) provide an adequate summary of the
data? (1 mark)
b) How much of the variation is accounted for by 2 factors? (1 mark)
c) How much of the variation is accounted for by 3 factors? (1 mark)
5. To get the scoring coefficients as eigenvalues use PROC PRINCOMP to display the scoring
coefficients as eigenvectors, use, and answer the following questions
3
proc princomp data=SocioEconomics;
run;
a) What are the eigenvalues and the respective eigenvectors? (3 marks)
b) What is the proportion of the variance accounted for by the first and second component
respectively? (1 mark)
c) Together how much do the first and second factors together account for the
standardised variance? (1 mark)
d) Do the final communality estimates show that all the variables are well accounted for by
how many components or factors. Justify your answer. (1 mark)
6. To obtain the component scores as linear combinations of the observed variables request
the standardized scoring coefficients by adding the SCORE option in the FACTOR statement:
and run this. Note that the SCORE option in the code below requests the display of the
standardized scoring coefficients.
proc factor data=SocioEconomics n=5 score;
run;
As each factor/component can expressed as a linear combination of the standardised
observed variables using the code above, answer the following questions:,
a) Write down the first principal component or Factor1 in terms of the standardised
variables. (1 mark)
b) Write down the second principal component or Factor2 in terms of the standardised
variables. (1 mark)
c) Write the first and second PCs in terms of eigenvectors. (1 mark)
NOTES/HINTS:
• The SIMPLE option specified in the PROC FACTOR statement generates the means and
standard deviations of all observed variables in the analysis
• The CORR option specified in the PROC FACTOR statement generates the output of the
observed correlations.
• To express the observed variables as functions of the components (or factors), you inspect
the factor loading matrix.
• To obtain the component scores as linear combinations of the observed variables request
the standardized scoring coefficients by adding the SCORE option in the FACTOR statement:
The SCORE option in the code below requests the display of the standardized scoring
coefficients
proc factor data=SocioEconomics n=5 score;
run;
4
QUESTION 3 (20 marks)
Six variables measured on 100 genuine and 100 forged (counterfeit/fake) old Swiss 1000-franc bank
notes are available in R library. See also the excel file in your Module.
data(banknote)
A data.frame of dimension 200×7 with the following 7 variables:
Class
Length
Left
Right
Bottom
Top
a factor with classes: genuine, counterfeit
Length of bill (mm) Width
of left edge (mm) Width of
right edge (mm) Bottom
margin width (mm)
Top margin width (mm)
Diagonal
Length of diagonal (mm)
Note that the Swiss bank data has 7 columns which correspond to the following variables:
1. Length of the bank note, length
2. Height of the bank note, measured on the left, left
3. Height of the bank note, measured on the right, right
4. Distance of inner frame to the lower border, bottom
5. Distance of inner frame to the upper border, top
6. Length of the diagonal, diag
7. Genuine = 1 and Fake =2 indicator column.
Show your SAS code, SAS output and answers within your final assignment pdf or docx that you
submit in Canvas.
Perform a stepwise Discriminant analysis on the bank note data.
a) Show your SAS code. (5 marks)
b) Show your SAS output. (5 marks)
c) Interpret your results in detail. (10 marks)