Question 1: Linear Regression

This question relates to the methods used to resolve the issues present in linear regression by doing variable selection so that predictors that fail to signiﬁcantly explain the response can be dropped. However you will ﬁnd that ridge regression although penalizes the coeﬃcients, still fails to do variable selection. Lasso on other hand resolves this issue by only shrinking the insigniﬁcant coeﬃcients towards zero. This question makes use of the Hitters dataset. 1. Load the Hitters dataset. Remove all those rows from Hitters dataset that have entry NA in the ”salary” column 2. For a design matrix construction, use function ”model.matrix” to read all variables in Hitters dataset excluding the salary and store them in variable x. Also, read the salary variable and store it in variable y. Generate a sequence of 100 values of λ between 10 10

and 10 −2 and call the function ”glmnet” from glmnet library. You can generate the sequence as 10Λseq(10, −2, length = 100), where Λ is a ”raised to” sign. For glmnet, set α = 0, and estimate ridge coeﬃcients for 100 λ values. Then, observe the set of coeﬃcients for two extreme values of λ i.e. 10 10 and also for 10 −2 . For which value of

λ among these two, the coeﬃcient values are more close to zero? 3. Now, draw a plot of l2-norm of coeﬃcient values (excluding the intercept’s coeﬃcient value) against the logarithm of the λ values. Can you say from this plot that you cannot really decide the optimal λ value between 10 10 and 10 −2 , better is to use the mean square error (MSE) plot against the λ values? Explain how can you say that? 4. The glment library already has a function ”cv.glmnet” that performs ten fold cross validation (CV). You are going to use this function to select an optimal λ. Now, ﬁrst you need to set the seed equal to 10 for random number generator. Then randomly pick 131 samples from x for all variables and also the corresponding samples from

y to construct a training dataset. The rest of the samples can be saved for testing dataset. Using this training dataset, plot the cross validation results, and ﬁnd the best

λ (the one that results in smallest CV error) value and its corresponding test MSE value (MSE value obtained using testing dataset and best λ), you may want to use ”predict” function here. Now reﬁt the ridge regression model on the full data set using the λ chosen by CV. Examine the coeﬃcients are they all present, similar to the linear regression case? 15. This time we set α = 1 (Lasso case) and again plot the cross validation results, and ﬁnd the best λ value (using training set) and its corresponding MSE value (using testing set). Now predict the coeﬃcients again using the best λ that we just selected. Were all coeﬃcients selected again? Well most of them are zero, are they not?

Question 2: Model Selection

In this question we consider the analysis of three model selection criteria for selecting the order p of the following model

yt = φ1yt−1 + …. + φpyt−p + ηt t = p + 1, …, n yt ∈ R

where ηt are independent identically distributed (i.i.d.) from N (0, σ 2 ). The criteria we consider are

IC1 = log σˆ 2 p

+ 2 (p + 1)

T IC2 = log σˆ 2 p

+ T + p T − p − 2

IC3 = log σˆ 2 p

+ p log (T)

where σˆ 2 p = RSSp

T = ky−yˆk 2

T . 1. In the IC’s given above, T represents the number of samples that can be used for estimating the parameters of the model. In the case of a model of order p above what is T? 2. Find the least square estimator of φ = (φ1, …, φp) >

3. Provide the expression of σˆ 2 p

4. Generate two sets of 100 samples using the models

M1 : yt = 0.434yt−1 +0.217yt−2 +0.145yt−3 +0.108yt−4 +0.087yt−5 +ηt ηt ∼ N(0, 1)

M2 : yt = 0.682yt−1 + 0.346yt−2 + ηt ηt ∼ N(0, 1) 5. Using these two sets, compute the values of IC1, IC2 and IC3 for p = 1, …, 10 for models M1 and M2. For each model provide a ﬁgure illustrating the variations of IC1,

IC2 and IC3 (plot the three criteria in a single ﬁgure for each model). 6. Using model M1 generate 1000 sets (vectors) of size 100 and provide a table of counts of the selected model by IC1, IC2 and IC3

27. Using model M1 generate 1000 sets of size 15 and provide a table of counts of the selected model by IC1, IC2 and IC3

8. Repeat questions 6 and 7 using model M2. 9. What do you observe from these tables? 10. Derive expressions for the probabilities of overﬁtting for the model selection criteria

IC1, IC2 and IC3. For the derivation you will assume the true model to be p0 and consider overﬁtting by L extra parameters. 11. Provide tables of the calculated probabilities for M1 in the cases n = 25 and n = 100 with L = 1, …, 8. 12. What are the important remarks that can be made from these probability tables? 13. The tables obtained from question 11 provide overﬁtting information as a function of the sample size. We are now interested in the case of large sample size or when n → ∞

(p0&L ﬁxed). Derive the expressions of the probabilities of overﬁtting in this case. 14. What is the important observation that you can make?

数学代写,计算统计代写|Computational Statistics and Data Science

于2023-09-15由easydue发布

代写

Matlab代写｜Assignment #3 – Edge Detection

数学代写

数学代写｜MATH-UA-233-007 Theory of Probability

作业代写

数学代写｜Math 4547 Final Exam