Econometrics II Winter 2020
OPTIONAL Final Exam
Due on March 19, 2020, by 3:30pm
Instructions: (1) Please do 2 of the 3 questions below; if you choose not to take
this exam, your grade will be based on your homework and midterm scores. (2) To
get full credit, please ensure that you present the model that you will be estimating
and all of your estimates and conclusions are legible and properly organized. (3)
Please submit your answers in pdf format (you can scan handwritten notesd using a
phone app or a scanner, typed notes can be saved in pdf format). (4) Please turn
in your estimation code (e.g., R file, Matlab *.m files, etc.), so your answers can be
replicated. You may use any software you wish (R, Matlab, EViews, Stata, etc.).
Hints: You may use any software you wish. If using R, you may use any package
you wish, including lm and glm. For glm, some examples are:
glm(y ~ X, family= …),
where for family you may use: binomial(link=“probit”), binomial(link=“logit”), and
“poisson”. For panel data questions you will need the library plm. If you do not have
To use the library:
Problem 1: The data in exper.csv pertain to the dependent variable “income”
of 100 individuals observed over 5 years, 1986-1990. The individuals are working
men and women aged 18-65. Covariates in this dataset include male dummy, race
dummy, health, experience, and distance from work. You are also given a person
ID (i = 1, . . . , 100) and year variable t = 1986, . . . , 1990, so that you know how to
organize the data.
1. Write down a model for income that captures individual heterogeneity in the
intercept. Discus how fixed effects estimation can be implemented in this case.
Will you be able to estimate the effects of race or gender?
2. Run the fixed effects regression that you specified in (1). In a well-labeled table,
please report your coefficient estimates and standard errors.
Econometrics II, Final Exam 2
Problem 2: The data set travel choice.csv contains data on personal travel choices
(public versus private transportation) including the variables CarTime (time in minutes taking a car to work), BusTime (time in minutes taking a bus to work), and
Dtime = (BusTime – CarTime)/10.
1. Estimate by OLS the linear probability model
Car = β0 + β1Dtime + ε,
and explain why there is a problem with some of the fitted values.
2. Present one different model specification that does not suffer from the problem
in (1). Provide the new coefficient estimates and standard errors in a welllabeled table.
Problem 3: Suppose you are interested in estimating the number of medals won by
a country in the Olympic games as a function the variables GDP (log Gross Domestic
Product, 1995 dollars) and Pop (log Population). These variables are in the file
games.csv (don’t forget to add a constant term in your regression).
1. Write down a Poisson model for these data. Present your coefficient estimates
and standard errors in a well-labeled table.
2. Australia had a population of 16.5 million and log-GDP of 26.49 (GDP was
≈ 3.2 × 1011 AUD) when it won 14 medals. Compare this actual number of
medals won by Australia to the predicted (mean) number resulting from your