这个Homework是用Matlab完成线性回归模型、房地产预估等

Nonlinear econometrics for finance

HOMEWORK 1

(Review of linear econometrics)

Problem 1. (18 points.) Consider the linear regression model Y = Xβ+ε

with pre-determined (i.e., non-stochastic) regressors X. All of the usual assumptions

hold except, instead of V ar(ε) = σ

2

In, we have V ar(ε) = E(εε0

) =

Σε, where Σε is a generic symmetric positive definite matrix. In other words,

we are assuming that the errors are not necessarily homoskedastic and uncorrelated.

(1) (2 points) The traditional least-squares estimator is βbLS = (X0X)

−1X

0

Y .

Define, now, a new estimator βbGLS = (X0Σ

−1

ε X)

−1X

0

Σ

−1

ε Y . Call it

“generalized least-squares” (GLS) estimator. Show that E(βbGLS) = β,

i.e., βbGLS is unbiased.

(2) (3 points) Show that the variance of βbLS under the new assumption is

(X0X)

−1X0ΣεX(X0X)

−1

.

(3) (3 points) Show that the variance of βbGLS under the new assumption

is (X0Σ

−1

ε X)

−1

.

(4) (4 points) Show that the variance of βbGLS is not larger than the variance

of the traditional least-squares estimator βbLS. In other words,

show that (X0X)

−1X0ΣεX(X0X)

−1 − (X0Σ

−1

ε X)

−1 ≥ 0.

[Hint: This is the same as showing (X0Σ

−1

ε X)−(X0X)(X0ΣεX)

−1

(X0X) ≥

0. Write the expression in terms of a suitable “quadratic form” using

a suitable “idempotent and symmetric matrix” and the proof is almost

complete. The proof is very similar to the one showing that βbLS is

BLUE in the standard regression model.]

1

Assume, now, that the regression Y = Xβ+ε is in a time-series context.

The error terms εt are independent of xt and modeled as an MA(1)

process. Specifically, write

εt = ut − θut−1,

where ut

is a white noise process with mean zero and variance σ

2

u

.

(5) (3 points) Find the variance-covariance matrix of the regression’s residuals

(E(εε0

) = Σε) as a function of the MA parameters.

[Hint: Compute V ar(εt) for all t and Cov(εt

, εt+j ) for all t and j as

a function of the MAs parameters and then use these values to fill in

Σε.]

(6) (3 points) We have learned that under the assumed structure on the

error terms the GLS estimator is more efficient than the least-squares

estimator. Using your response in (5) above, how would you make the

GLS estimator “feasible”? In other words, how would you estimate

the matrix Σε (i.e., obtain Σbε) to define an implementable version of

βeGLS = (X0Σb−1

ε X)

−1X

0

Σb−1

ε Y ? Be as precise as possible.

Problem 2. (16 points.) Consider the regression model Y = Xβ + ε.

Assume X is stochastic and ε is such that E(X0

ε) 6= 0. However, there is a

matrix of variables Z such that E(Z

0

ε) = 0 and E(Z

0X) 6= 0. The dimension

of the matrix X is T ×k (T is the number of observations and k is the number

of regressors) whereas the dimension of the matrix Z is T × q with q > k.

(1) (2 points) Is βb from a regression of Y on X consistent for β?

(2) (2 points) Regress the matrix X on the matrix Z (i.e., you want to

regress each column of X on the matrix Z). Express the fitted values

compactly as a function of X and Z.

(3) (2 points) Regress the observations Y on the fitted values from the

previous regression (a T × k matrix). Express compactly the new estimator

as a function of X, Z, and Y . (Note: you could use a very

specific idempotent matrix here).

(4) (3 points) Is the new estimator consistent for β?

2

(5) (3 points) Assume k = q. Does the form of the estimator simplify?

(6) (4 points) Interpret all of your previous results from an applied standpoint.

Why are they useful?

Problem 3. (16 points) Real estate is a key asset. Investing in real estate

represents the biggest investment decision for most households over their

lifetimes. A real estate company in Baltimore wants to estimate a model

to relate the house prices to several characteristics of the house. The data

come from Zillow and consists of a sample of houses in the Baltimore area

for the year 2014. The data are contained in the file housing data.xslx and

provide the following information:

• Zillow id of the house (id)

• price in dollars (price)

• street address (street)

• postal code (zip)

• year the house was built (yearBuilt)

• size of the house measured in square feet (sqft)

• number of bathrooms (bathrooms)

• number of bedrooms (bedrooms).

Given this information, you need to run a linear regression for the price

of houses using Matlab.

(1) (1 point) Generate an histogram of the house prices and compute descriptive

statistics (mean, median, variance, standard deviation, minimum,

maximum). What do you notice?

(2) (1 point) Now take a log transformation of the house prices. Plot the

histogram of the log-prices. What do you notice?

(3) (2 points) Run a regression of the log-prices on the explanatory variables:

log(pricei) = β0 + β1agei + β2sizei + β3bathroomsi + β4bedroomsi + ui

where ui

is an error term.

3

(4) (2 points) Give an economic interpretation of the estimated coefficients

in the regression above. What does the model say about the house

prices?

(5) (2 points) Why do you think β4 is negative?

(6) (2 points) We want to test whether the coefficient β3 for the number

of bathrooms is statistically significant. What test would you use?

Compute the test statistic and interpret the result.

(7) (2 points) Test whether the coefficients for the number of bathrooms

and the number of bedrooms (i.e., β3 and β4) are jointly different than

zero.

(8) (2 points) Test whether β3 = −β4.

(9) (2 points) Using your model, predict the price of a house with 4 bedrooms,

3 bathrooms, size of 2500 square feet and built in 1945. Explain

how you compute your prediction.

4