这个作业是用R语言分析广告数据集的R语言代写

PSTAT 131/231, Spring 2020
Due by June 10, 2020 at 11:59 PM
1.广告数据
在第一个问题中，我们将统计机器学习方法应用于随之而来的广告数据集与教科书ISL。
＃将此设置为数据所在的工作目录
setwd（“ / Users / girigopalan / Desktop / UCSB / Teaching / PSTAT_131_231_S_20 / Final”）
广告<-read.csv（’Advertising.csv’）
＃每行包含不同的产品，列分别是电视广告预算，广播广告预算，
＃报纸广告预算和产品的总销售额（千美元）。
＃首先，我们将在广告预算上执行一些无监督的学习方法，
＃因此只能查看电视，广播和报纸的广告预算。
功能<-广告[，2：4]
Y <-广告[，5]
a）（5分）在电视，广播和报纸预算数据（即功能数据）上执行分层聚类。具体来说，绘制一个树状图，并提取四个簇。返回每个群集的数据点数。
b）（5分）确定要素数据的主要组成部分。具体来说，不要居中而是缩放确定主要成分之前的数据。根据前两个原理绘制特征数据组件。然后，确定特征主要成分所解释的方差比例数据。
1个
c）（10分）现在，我们将运用监督学习来预测电视，广播和电视节目的销售额（Y）报纸广告预算（功能）。您将不得不比较以下方法：
•线性回归
•随机森林
•岭回归
注意：由于特征（3）的数量很少，我们将不比较LASSO回归。对于每个以上三种监督学习方法确定了10倍交叉验证错误，尤其是：
•在随机森林中使用ntree = 100、500、1000。
•使用lambda = 10−2 10-1，…，103
进行岭回归。
因此，对于岭回归，您应该总共输出6个交叉验证错误，而对于3个交叉验证错误则输出随机森林和线性回归的1个交叉验证错误。在所有情况下，均方差均应作为误差指标。确保指出哪种监督学习方法会导致最小的交叉验证错误。
最后，在运行这部分代码之前，您应该种子（123）
2. Heart disease data
In the second problem, we will apply statistical machine learning methodology to the heart disease data set that
comes with the textbook, ISL.
#Set this to your working directory where the data are
setwd(“/Users/girigopalan/Desktop/UCSB/Teaching/PSTAT_131_231_S_20/Final”)
heart <- read.csv(‘Heart.csv’)
DAT <- model.matrix( ~ . , heart)
features <- DAT[,3:18]
Y <- DAT[,19]
set.seed(123)
The training features are in features, and the output labels (1 or 0 for heart disease) are in Y. For this problem you must:
• (5 points) Perform 10-fold cross validation to fit a ridge logistic regression model, in order to predict heart disease.
• (5 points) Perform 10-fold cross validation to fit a lasso logistic regression model, in order to predict heart disease.
• (5 points) Determine the predicted probabilities of having heart disease using both the best fitting ridge and lasso models. (On the features matrix.)
• (5 points) Plot the ROC curves for both the best fitting ridge logistic regression and lasso logistic regression,both on the same plot.
• (5 points) Determine the AUC for both the best fitting ridge logistic regression and lasso logistic regression models, both on the same plot.
Which is the best model based on AUC?
HINT: You can use the function cv.glmnet from glmnet to perform cross validation, and you do not need to specify a value for the parameter gamma. See labs and lecture slides for more.

R语言代写 | PSTAT 131/231, Spring 2020

于2020-06-042020-06-04由easydue发布

这个作业是用R语言分析广告数据集的R语言代写

代写案例

商科代写｜MKF2111 BUYER BEHAVIOUR PRACTICAL APPLICATIONS PART 2

代写案例

金融代写｜ACFI814 International Finance Coursework

代写案例

商科代写｜5BUS1199 Business Operations Assignment 1

R语言代写 | PSTAT 131/231, Spring 2020

于2020-06-042020-06-04由easydue发布

这个作业是用R语言分析广告数据集的R语言代写

相关文章

代写案例

商科代写｜MKF2111 BUYER BEHAVIOUR PRACTICAL APPLICATIONS PART 2

代写案例

金融代写｜ACFI814 International Finance Coursework

代写案例

商科代写｜5BUS1199 Business Operations Assignment 1