## 这个作业是用R语言分析广告数据集的R语言代写

PSTAT 131/231, Spring 2020
Due by June 10, 2020 at 11:59 PM
1.广告数据

＃将此设置为数据所在的工作目录
setwd（“ / Users / girigopalan / Desktop / UCSB / Teaching / PSTAT_131_231_S_20 / Final”）

＃每行包含不同的产品，列分别是电视广告预算，广播广告预算，
＃报纸广告预算和产品的总销售额（千美元）。
＃首先，我们将在广告预算上执行一些无监督的学习方法，
＃因此只能查看电视，广播和报纸的广告预算。

Y <-广告[，5]
a）（5分）在电视，广播和报纸预算数据（即功能数据）上执行分层聚类。具体来说，绘制一个树状图，并提取四个簇。返回每个群集的数据点数。
b）（5分）确定要素数据的主要组成部分。具体来说，不要居中而是缩放确定主要成分之前的数据。根据前两个原理绘制特征数据组件。然后，确定特征主要成分所解释的方差比例数据。
1个
c）（10分）现在，我们将运用监督学习来预测电视，广播和电视节目的销售额（Y）报纸广告预算（功能）。您将不得不比较以下方法：
•线性回归
•随机森林
•岭回归

•在随机森林中使用ntree = 100、500、1000。
•使用lambda = 10−2 10-1，…，103

2. Heart disease data
In the second problem, we will apply statistical machine learning methodology to the heart disease data set that
comes with the textbook, ISL.
#Set this to your working directory where the data are
setwd(“/Users/girigopalan/Desktop/UCSB/Teaching/PSTAT_131_231_S_20/Final”)
DAT <- model.matrix( ~ . , heart)
features <- DAT[,3:18]
Y <- DAT[,19]
set.seed(123)
The training features are in features, and the output labels (1 or 0 for heart disease) are in Y. For this problem you must:
• (5 points) Perform 10-fold cross validation to fit a ridge logistic regression model, in order to predict heart disease.
• (5 points) Perform 10-fold cross validation to fit a lasso logistic regression model, in order to predict heart disease.
• (5 points) Determine the predicted probabilities of having heart disease using both the best fitting ridge and lasso models. (On the features matrix.)
• (5 points) Plot the ROC curves for both the best fitting ridge logistic regression and lasso logistic regression,both on the same plot.
• (5 points) Determine the AUC for both the best fitting ridge logistic regression and lasso logistic regression models, both on the same plot.
Which is the best model based on AUC?
HINT: You can use the function cv.glmnet from glmnet to perform cross validation, and you do not need to specify a value for the parameter gamma. See labs and lecture slides for more.