## 这个Homework是用Matlab完成线性回归模型、房地产预估等金融编程代写

Nonlinear econometrics for finance HOMEWORK 1

(Review of linear econometrics)

Problem 1. (18 points.) Consider the linear regression model Y = Xβ+ε with pre-determined (i.e., non-stochastic) regressors X. All of the usual assumptions hold except, instead of V ar(ε) = σ 2 In, we have V ar(ε) = E(εε0) =Σε, where Σε is a generic symmetric positive definite matrix. In other words,we are assuming that the errors are not necessarily homoskedastic and uncorrelated.

(1) (2 points) The traditional least-squares estimator is βbLS = (X0X)−1X0Y .

Define, now, a new estimator βbGLS = (X0Σ−1ε X)−1X0Σ−1ε Y . Call it “generalized least-squares” (GLS) estimator. Show that E(βbGLS) = β,i.e., βbGLS is unbiased.

(2) (3 points) Show that the variance of βbLS under the new assumption is (X0X)−1X0ΣεX(X0X)−1.

(3) (3 points) Show that the variance of βbGLS under the new assumption is (X0Σ−1ε X)−1.

(4) (4 points) Show that the variance of βbGLS is not larger than the variance of the traditional least-squares estimator βbLS. In other words,show that (X0X)−1X0ΣεX(X0X)−1 − (X0Σ−1ε X)−1 ≥ 0.

[Hint: This is the same as showing (X0Σ−1ε X)−(X0X)(X0ΣεX)−1(X0X) ≥0. Write the expression in terms of a suitable “quadratic form” using a suitable “idempotent and symmetric matrix” and the proof is almost complete. The proof is very similar to the one showing that βbLS is BLUE in the standard regression model.]

Assume, now, that the regression Y = Xβ+ε is in a time-series context.

The error terms εt are independent of xt and modeled as an MA(1) process. Specifically, write εt = ut − θut−1,where ut is a white noise process with mean zero and variance σ2u.

(5) (3 points) Find the variance-covariance matrix of the regression’s residuals (E(εε0) = Σε) as a function of the MA parameters.[Hint: Compute V ar(εt) for all t and Cov(εt, εt+j ) for all t and j as a function of the MAs parameters and then use these values to fill in Σε.]

(6) (3 points) We have learned that under the assumed structure on the error terms the GLS estimator is more efficient than the least-squares estimator. Using your response in (5) above, how would you make the GLS estimator “feasible”? In other words, how would you estimate the matrix Σε (i.e., obtain Σbε) to define an implementable version of βeGLS =(X0Σb−1ε X)−1X0 Σb−1ε Y ? Be as precise as possible.

Problem 2. (16 points.) Consider the regression model Y = Xβ + ε.

Assume X is stochastic and ε is such that E(X0ε) 6= 0. However, there is a matrix of variables Z such that E(Z0ε) = 0 and E(Z0X) 6= 0. The dimension of the matrix X is T ×k (T is the number of observations and k is the number of regressors) whereas the dimension of the matrix Z is T × q with q > k.

(1) (2 points) Is βb from a regression of Y on X consistent for β?

(2) (2 points) Regress the matrix X on the matrix Z (i.e., you want to regress each column of X on the matrix Z). Express the fitted values compactly as a function of X and Z.

(3) (2 points) Regress the observations Y on the fitted values from the previous regression (a T × k matrix). Express compactly the new estimator as a function of X, Z, and Y . (Note: you could use a very specific idempotent matrix here).

(4) (3 points) Is the new estimator consistent for β?

(5) (3 points) Assume k = q. Does the form of the estimator simplify?

(6) (4 points) Interpret all of your previous results from an applied standpoint.

Why are they useful?

Problem 3. (16 points) Real estate is a key asset. Investing in real estate represents the biggest investment decision for most households over their lifetimes. A real estate company in Baltimore wants to estimate a model to relate the house prices to several characteristics of the house. The data come from Zillow and consists of a sample of houses in the Baltimore area for the year 2014. The data are contained in the file housing data.xslx and provide the following information:

• Zillow id of the house (id)

• price in dollars (price)

• street address (street)

• postal code (zip)

• year the house was built (yearBuilt)

• size of the house measured in square feet (sqft)

• number of bathrooms (bathrooms)

• number of bedrooms (bedrooms).

Given this information, you need to run a linear regression for the price of houses using Matlab.

(1) (1 point) Generate an histogram of the house prices and compute descriptive statistics (mean, median, variance, standard deviation, minimum, maximum). What do you notice?

(2) (1 point) Now take a log transformation of the house prices. Plot the histogram of the log-prices. What do you notice?

(3) (2 points) Run a regression of the log-prices on the explanatory variables:

log(pricei) = β0 + β1agei + β2sizei + β3bathroomsi + β4bedroomsi + ui where ui is an error term.

(4) (2 points) Give an economic interpretation of the estimated coefficients in the regression above. What does the model say about the house prices?

(5) (2 points) Why do you think β4 is negative?

(6) (2 points) We want to test whether the coefficient β3 for the number of bathrooms is statistically significant. What test would you use?

Compute the test statistic and interpret the result.

(7) (2 points) Test whether the coefficients for the number of bathrooms and the number of bedrooms (i.e., β3 and β4) are jointly different than zero.

(8) (2 points) Test whether β3 = −β4.

(9) (2 points) Using your model, predict the price of a house with 4 bedrooms,3 bathrooms, size of 2500 square feet and built in 1945. Explain how you compute your prediction.