Machine Learning and Big Data for Economics and Finance
Consider the three variables in the dataset Assign2.csv. We are interested in predicting the third variable given the rst two variables as inputs.
1. Plot the data on a gure with the rst variable on the x-axis, the second vari- able on the y-axis and where the points color depends on the third variable.
2. Fit a linear model to the data. Produce a confusion matrix to show how well the model ts the data.
3. Repeat the same exercise for each of the following: a. Logistic regression.
b. Linear Discrminant Analysis.
c. Quadratic Discriminant Analysis.
4. Fit the model by k-nearest neighbor classication with k = 1; :::; 20. Produce a confusion matrix for each k.
5. Choose between all 24 methods by using 10-fold cross-validation. Try to justify the results based on your intuition regarding the data.