ECMT2150 INTERMEDIATE ECONOMETRICS
SEMESTER 2, 2019
ASSIGNMENT 2
Answer all questions
INSTRUCTIONS
• Due Date: Friday 25 October 2019 (5:00pm)
• Submission Instructions: Your assignment MUST be submitted as a single *.pdf
file online through Canvas, via Turnitin.
• You must type your assignment.
• Anonymous marking: Do NOT put your name anywhere on your assignment
or in the file name. Identify yourself only by your student number.
• Show your working, otherwise only partial marks will be awarded
• You will need to use STATA (or another regression software program) to complete
parts of this assignment. Please attach no more than 2 pages (your “do file”
of commands and/or key parts of your output) at the end of your assignment,
otherwise only partial marks will be awarded.
• The assignment is worth 10.0% of your final grade for this UoS
Question 1:
The variable smokes is a binary variable equal to one if a person smokes, and zero otherwise.
Using the data in SMOKE1, we wish to estimate the following linear probability model:
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝛽𝛽0 + 𝛽𝛽1 log(𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐) + 𝛽𝛽2 log(𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖) + 𝛽𝛽3𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽4𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽5𝑎𝑎𝑎𝑎𝑒𝑒2
+ 𝛽𝛽6𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 + 𝛽𝛽7𝑤𝑤ℎ𝑖𝑖𝑖𝑖𝑖𝑖 + 𝑒𝑒
where log(cigprice) is the log of the price per cigarette, log(income) is the log of annual income,
educ is years of schooling, age is a person’s age in years, restaurn equals one if the person lives
in a state with restaurant smoking restrictions, and white equals one if the respondent is white.
i) Estimate the linear probability model for smokes using:
a. Usual standard errors (in parentheses)
b. Heteroskedasticity-robust standard errors (in square brackets)
ii) Are there any important differences between the two sets of standard errors?
iii) Holding other factors fixed, if education increases by four years, what happens to the
estimated probability of smoking?
iv) At what point does another year of age reduce the probability of smoking?
v) Interpret the coefficient on the binary variable resturan.
vi) Compute the predicted probability of smoking for person number 206 (id=206) in the
data set.
Question 2:
Use the data in WAGE2 to answer this question. The model of interest is:
log (𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤) = 𝛽𝛽0 + 𝛽𝛽1𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽2𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽3𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽4𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑒𝑒,
where log(wage) is the natural log of monthly earnings and educ is the number of years of
education.
i) How many people are in the sample? What percentage of these people have more than
12 years of education?
ii) Estimate the above equation by OLS. What is the estimate of 𝛽𝛽1? What is its 95%
confidence interval?
iii) Using 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 as an instrument for 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒, estimate the reduced form for 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒. What is
the t statistic for 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠? Is there evidence of a weak instrument problem?
iv) Estimate the above equation by IV, using 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 as an IV for 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒. How does the
estimate and 95% CI compare with the OLS quantities?
v) Test the null hypothesis that 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 is exogenous. What is the p-value of the test?
Question 3:
Use the data in WAGE2 for this exercise.
i) Estimate the model
log(𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤) = 𝛽𝛽0 + 𝛽𝛽1𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽2𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝛽𝛽3𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛽𝛽4𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 + 𝛽𝛽5𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏
+ 𝛽𝛽6𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠ℎ + 𝛽𝛽7𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 + 𝑢𝑢,
and report the results in the usual form. Holding other factors fixed, what is the
approximate difference in monthly salary between blacks and nonblacks? Is this
difference statistically significant?
ii) Add the variables 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑟𝑟2 and 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑒𝑒2 to the equation and show that they are jointly
insignificant at even the 20% level.
iii) Extend the original model to allow the return to education to depend on race and test
whether the return to education does depend on race.
iv) Again, start with the original model, but now allow wages to differ across four groups
of people: married and black, married and nonblack, single and black, and single and
nonblack. What is the estimated wage differential between married blacks and married
nonblacks?
Question 4:
The following table contains data taken from 10 students. It shows their math test scores (math,
range 0-100), whether they attend a catholic high school (cathhs), and whether their parents are
catholic (parcath).
Student ID math cathhs parcath
1 48.12 0 1
2 58.77 1 1
3 51.51 1 1
4 56.55 1 1
5 59.69 1 1
6 50.88 1 1
7 37.06 0 0
8 56.52 0 0
9 50.12 0 0
10 49.14 0 1
Answer the questions below using only a calculator and show your work. Do not use EXCELL
or STATA.
i) Estimate the relationship between math and cathhs using IV estimation; that is, obtain
the intercept and slope estimates in an IV regression of math on cathhs, where parcath is
an IV for cathhs.
ii) Comment on the direction of the relationship between math and cathhs. Does the
intercept have a useful interpretation here? Explain. How much higher is math predicted
to be if a student attended a catholic high school?
iii) Compute the fitted values and residuals for each observation and verify that the
residuals (approximately) sum to zero.
iv) How much of the variation in math for these 10 students is explained by cathhs?
Explain.