# FE590. Assignment #2.

# 指示
在此作业中,您应使用R降价回答以下问题。只需将R代码输入嵌入式块即可,如上所示。
完成分配后,将文档编织为PDF文件,然后将.pdf和.Rmd文件都上传到Canvas。
“`{r}
CWID = -1#在此处放置您的校园范围ID号,这将个性化
#您的结果,但仍保持使用种子的可复制性。
#如果您需要在此分配中重置种子,请将其用作种子
#将-1用作此CWID变量的纸张将获得0,因此请确保您进行更改
#this值,然后再提交您的工作。
个人= CWID %% 10000
set.seed(personal)#您可以随时在代码中重置种子,
#但请始终将其设置为此种子。
“`
#问题1

创建一个.csv文件,该文件包含10种不同股票和2只ETF的每日调整后收盘价。每个资产和ETF至少应有两年的数据,并应包括数据日期,以确保适当地包含所有内容。创建文件后,将其放在您的工作目录中(或将工作目录移动到其存储位置)。将数据读入R。

“`{r}
#在此处插入r代码
“`

## 1.列出数据集中变量的名称。

“`{r}
#在此处插入r代码
“`

## 2.由于日期不重要,因此请从数据框中删除该字段

“`{r}
#在此处插入r代码
“`

## 3.每个定量变量的范围是多少?使用range()函数和sapply()函数(例如sapply(cars,range))来回答这个问题。打印一个简单的变量范围表。这些行应对应于变量。第一列应为相应变量的最小值,第二列应为变量的最大值。列应适当标记。

“`{r}
#在此处插入r代码
“`

## 4.每个变量的平均值和标准偏差是多少?创建一个简单的均值和标准差表。

“`{r}
#在此处插入r代码
“`

## 5.使用Jumps库中的regsubsets函数,对剩余资产退还您的ETF之一。

“`{r}
#在此处插入r代码
“`

### 一个。打印一张表格,该表格显示将使用最佳子集选择为所有直到2级的预测变量选择哪些变量(即资产和资产^ 2)。使用AIC确定最佳模型并输出模型,包括其系数。

“`{r}
#在此处插入r代码
“`

### b。打印一张表格,该表格显示将使用正向子集选择为直到2级(即资产和资产^ 2)的所有预测变量选择哪些变量。使用BIC确定最佳模型并输出模型,包括其系数。

“`{r}
#在此处插入r代码
“`

### C。打印一张表格,该表格显示将使用反向子集选择为直到2级的所有预测变量选择哪些变量(即资产和资产^ 2)。使用调整后的R ^ 2确定最佳模型,并输出包括其系数在内的模型。

“`{r}
#在此处插入r代码
“`

# 问题2

使用为第一个问题加载的数据集,选择另一个ETF,并创建一个数据框架,其中包含可追溯至10天的简单滞后收益。在此数据框中创建另一个字段,以展望ETF未来一天的走势,该方向应作为一个因素而不是数字列出。

## 1.将您的数据分为一个训练集和一个测试集,并使用训练集上的所有10个不同时滞按方向运行LDA。您的模型有多精确?

“`{r}
#在此处插入r代码
“`

## 2.创建代码,以使用K = 5交叉验证来确定预期测试错误的估计。通过将日期实际分成五部分并给出测试错误的平均值,而不仅仅是使用软件包中的命令来执行此操作。

## 3.确定模型的预期测试误差的LOOCV估计值。您对此问题各部分的答案如何比较?您看到答案之间有什么明显的不同吗?你为什么这么认为呢?

#问题3

应该使用ISLR数据包中的Weekly数据集回答此问题。该数据包含从1990年初到2010年底的21年的1,089个每周回报。

## 1.数据代表什么?

“`{r}
#在此处插入r代码
“`

## 2.使用完整数据集执行Logistic回归,将Direction作为响应,并将五个滞后变量加Volume作为预测变量。使用摘要功能打印结果。是否有任何预测指标具有统计学意义?如果是这样,哪个?

“`{r}
#在此处插入r代码
“`

## 3.使用从1990年到2008年的训练数据周期拟合逻辑回归模型,并使用您确定的上一个问题在统计上具有显着性的预测因子。在保留的数据(即2009年和2010年的数据)上测试模型,并表达其准确性。

“`{r}
#在此处插入r代码
“`

## 4.使用LD重复第3部分

# Instructions
In this assignment, you should use R markdown to answer the questions below. Simply type your R code into embedded chunks as shown above.
When you have completed the assignment, knit the document into a PDF file, and upload both the .pdf and .Rmd files to Canvas.
“`{r}
CWID = -1 #Place here your Campus wide ID number, this will personalize
#your results, but still maintain the reproduceable nature of using seeds.
#If you ever need to reset the seed in this assignment, use this as your seed
#Papers that use -1 as this CWID variable will earn 0’s so make sure you change
#this value before you submit your work.
personal = CWID %% 10000
set.seed(personal)#You can reset the seed at any time in your code,
#but please always set it to this seed.
“`
# Question 1

Create a .csv file consisting of daily adjusted close prices for 10 different stocks and 2 ETF’s. You should have at least two years of data for every asset and ETF and should include the date for your data to make sure that you are including everything appropriately. After creating the file, put it in your working directory (or move your working directory to where its stored). Read the data into R.

“`{r}
#insert r code here
“`

## 1. List the names of the variables in the data set.

“`{r}
#insert r code here
“`

## 2. As the date will be unimportant, remove that field from your data frame

“`{r}
#insert r code here
“`

## 3. What is the range of each quantitative variable? Answer this question using the range() function with the sapply() function e.g., sapply(cars, range). Print a simple table of the ranges of the variables. The rows should correspond to the variables. The first column should be the lowest value of the corresponding variable, and the second column should be the maximum value of the variable. The columns should be suitably labeled.

“`{r}
#insert r code here
“`

## 4. What is the mean and standard deviation of each variable? Create a simple table of the means and standard deviations.

“`{r}
#insert r code here
“`

## 5. Using the regsubsets function in the leaps library, regress one of your ETF’s on the remaining assets.

“`{r}
#insert r code here
“`

### a. Print a table showing what variables would be selected using best subset selection for all predictors up to order 2 (i.e. asset and asset^2). Determine the optimal model using AIC and output the model, including its coeffecients.

“`{r}
#insert r code here
“`

### b. Print a table showing what variables would be selected using forward subset selection for all predictors up to order 2 (i.e. asset and asset^2). Determine the optimal model using BIC and output the model, including its coeffecients.

“`{r}
#insert r code here
“`

### c. Print a table showing what variables would be selected using backward subset selection for all predictors up to order 2 (i.e. asset and asset^2). Determine the optimal model using adjusted R^2 and output the model, including its coeffecients.

“`{r}
#insert r code here
“`

# Question 2

Using the data set that you loaded for the first problem, choose the other ETF, and create a data frame consisting of simple lagged returns going up to 10 days back. Create another field in this data frame that looks to the direction of the ETF moving one day into the future, this direction should be listed as a factor, not a number.

## 1. Split your data into a training set and a testing set and run LDA on the direction using all 10 different lags on the training set. How accurate is your model?

“`{r}
#insert r code here
“`

## 2. Create code to determine the estimate for the expected test error using K=5 cross validation. Do this by actually splitting the date into five pieces and give the average of the test error, not just by using a command from a package.

## 3. Determine the LOOCV estimate of the expected test error of your model. How do your answers to each part of this question compare? Do you see any noticable differences between your answer? Why do you think that is?

# Question 3

This question should be answered using the Weekly data set, which is part of the ISLR package. This data contains 1,089 weekly returns for 21 years, from the beginning of 1990 to the end of 2010.

## 1. What does the data represent?

“`{r}
#insert r code here
“`

## 2. Use the full data set to perform a logistic regression with Direction as the response and the five lag variables plus Volume as predictors. Use the summary function to print the results. Do any of the predictors appear to be statistically significant? If so, which ones?

“`{r}
#insert r code here
“`

## 3. Fit a logistic regression model using a training data period from 1990 to 2008, using the predictors from the previous problem that you determined were statistically significant. Test your model on the held out data (that is, the data from 2009 and 2010) and express its accuracy.

“`{r}
#insert r code here
“`

## 4. Repeat Part 3 using LDA.

“`{r}
#insert r code here
“`

## 5. Repeat Part 3 using QDA.

“`{r}
#insert r code here
“`

## 6. Repeat Part 3 using KNN with K = 1, 2, 3.

“`{r}
#insert r code here
“`

## 7. Which of these methods in Parts 3, 4, 5, and 6 appears to provide the best results on this data?

“`{r}
#insert r code here
“`

# Question 4

## Write a function that works in R to gives you the parameters from a linear regression on a data set of $n$ predictors. You can assume all the predictors and the prediction is numeric. Include in the output the standard error of your variables. You cannot use the lm command in this function or any of the other built in regression models.

“`{r}
#insert r code here
“`

## Compare the output of your function to that of the lm command in R.

“`{r}
#insert r code here
“`

# Question 5

As you have probably seen in this homework, just simply looking at the close prices and trying to run models on the variables is not terribly interesting. You’ve begun to see what types of techniques we will be studying in this class. Here is an exerpt from the final project/homework:

“In this assignment, you will be required to find a set of data to run regression on. This data set should be financial in nature, and of a type that will work with the models we have discussed this semester (hint: we didn’t look at time series) You may not use any of the data sets in the ISLR package that we have been looking at all semester. Your data set that you choose should have both qualitative and quantitative variables. (or has variables that you can transform)

Provide a description of the data below, where you obtained it, what the variable names are and what it is describing.”

You don’t have to actually create the data set at this time, but what sort of problem are you looking to solve? What data set would you need to answer this question? Please provide what you are looking into and how you could approach the problem below.