Nonlinear econometrics for finance
HOMEWORK 1
(Review of linear econometrics)
Problem 1. (18 points.) Consider the linear regression model Y = Xβ+ε
with pre-determined (i.e., non-stochastic) regressors X. All of the usual assumptions
hold except, instead of V ar(ε) = σ
2
In, we have V ar(ε) = E(εε0
) =
Σε, where Σε is a generic symmetric positive definite matrix. In other words,
we are assuming that the errors are not necessarily homoskedastic and uncorrelated.
(1) (2 points) The traditional least-squares estimator is βbLS = (X0X)
−1X
0
Y .
Define, now, a new estimator βbGLS = (X0Σ
−1
ε X)
−1X
0
Σ
−1
ε Y . Call it
“generalized least-squares” (GLS) estimator. Show that E(βbGLS) = β,
i.e., βbGLS is unbiased.
(2) (3 points) Show that the variance of βbLS under the new assumption is
(X0X)
−1X0ΣεX(X0X)
−1
.
(3) (3 points) Show that the variance of βbGLS under the new assumption
is (X0Σ
−1
ε X)
−1
.
(4) (4 points) Show that the variance of βbGLS is not larger than the variance
of the traditional least-squares estimator βbLS. In other words,
show that (X0X)
−1X0ΣεX(X0X)
−1 − (X0Σ
−1
ε X)
−1 ≥ 0.
[Hint: This is the same as showing (X0Σ
−1
ε X)−(X0X)(X0ΣεX)
−1
(X0X) ≥
0. Write the expression in terms of a suitable “quadratic form” using
a suitable “idempotent and symmetric matrix” and the proof is almost
complete. The proof is very similar to the one showing that βbLS is
BLUE in the standard regression model.]
1
Assume, now, that the regression Y = Xβ+ε is in a time-series context.
The error terms εt are independent of xt and modeled as an MA(1)
process. Specifically, write
εt = ut − θut−1,
where ut
is a white noise process with mean zero and variance σ
2
u
.
(5) (3 points) Find the variance-covariance matrix of the regression’s residuals
(E(εε0
) = Σε) as a function of the MA parameters.
[Hint: Compute V ar(εt) for all t and Cov(εt
, εt+j ) for all t and j as
a function of the MAs parameters and then use these values to fill in
Σε.]
(6) (3 points) We have learned that under the assumed structure on the
error terms the GLS estimator is more efficient than the least-squares
estimator. Using your response in (5) above, how would you make the
GLS estimator “feasible”? In other words, how would you estimate
the matrix Σε (i.e., obtain Σbε) to define an implementable version of
βeGLS = (X0Σb−1
ε X)
−1X
0
Σb−1
ε Y ? Be as precise as possible.
Problem 2. (16 points.) Consider the regression model Y = Xβ + ε.
Assume X is stochastic and ε is such that E(X0
ε) 6= 0. However, there is a
matrix of variables Z such that E(Z
0
ε) = 0 and E(Z
0X) 6= 0. The dimension
of the matrix X is T ×k (T is the number of observations and k is the number
of regressors) whereas the dimension of the matrix Z is T × q with q > k.
(1) (2 points) Is βb from a regression of Y on X consistent for β?
(2) (2 points) Regress the matrix X on the matrix Z (i.e., you want to
regress each column of X on the matrix Z). Express the fitted values
compactly as a function of X and Z.
(3) (2 points) Regress the observations Y on the fitted values from the
previous regression (a T × k matrix). Express compactly the new estimator
as a function of X, Z, and Y . (Note: you could use a very
specific idempotent matrix here).
(4) (3 points) Is the new estimator consistent for β?
2
(5) (3 points) Assume k = q. Does the form of the estimator simplify?
(6) (4 points) Interpret all of your previous results from an applied standpoint.
Why are they useful?
Problem 3. (16 points) Real estate is a key asset. Investing in real estate
represents the biggest investment decision for most households over their
lifetimes. A real estate company in Baltimore wants to estimate a model
to relate the house prices to several characteristics of the house. The data
come from Zillow and consists of a sample of houses in the Baltimore area
for the year 2014. The data are contained in the file housing data.xslx and
provide the following information:
• Zillow id of the house (id)
• price in dollars (price)
• postal code (zip)
• year the house was built (yearBuilt)
• size of the house measured in square feet (sqft)
• number of bathrooms (bathrooms)
• number of bedrooms (bedrooms).
Given this information, you need to run a linear regression for the price
of houses using Matlab.
(1) (1 point) Generate an histogram of the house prices and compute descriptive
statistics (mean, median, variance, standard deviation, minimum,
maximum). What do you notice?
(2) (1 point) Now take a log transformation of the house prices. Plot the
histogram of the log-prices. What do you notice?
(3) (2 points) Run a regression of the log-prices on the explanatory variables:
log(pricei) = β0 + β1agei + β2sizei + β3bathroomsi + β4bedroomsi + ui
where ui
is an error term.
3
(4) (2 points) Give an economic interpretation of the estimated coefficients
in the regression above. What does the model say about the house
prices?
(5) (2 points) Why do you think β4 is negative?
(6) (2 points) We want to test whether the coefficient β3 for the number
of bathrooms is statistically significant. What test would you use?
Compute the test statistic and interpret the result.
(7) (2 points) Test whether the coefficients for the number of bathrooms
and the number of bedrooms (i.e., β3 and β4) are jointly different than
zero.
(8) (2 points) Test whether β3 = −β4.
(9) (2 points) Using your model, predict the price of a house with 4 bedrooms,
3 bathrooms, size of 2500 square feet and built in 1945. Explain 