ECON 400: Introduction to Econometrics
Problem Set #4
Due: 03/02/2020
Problem #1:
Use the Birthweight_Smoking data set available on Canvas to answer the following questions. To
begin, run three regressions:
1. Birthweight on Smoker
2. Birthweight on Smoker, Alcohol, and Nprevist
3. Birthweight on Smoker, Alcohol, Nprevist, and Unmarried
a. What is the value of the estimated effect of smoking on birth weight in each of the
regressions?
b. Construct a 95% confidence interval for the effect of smoking on birth weight, using each of
the regressions.
c. Does the coefficient on Smoker in regression (1) suffer from omitted variable bias? Explain.
d. Does the coefficient on Smoker in regression (2) suffer from omitted variable bias? Explain.
e. Consider the coefficient onUnmarried in regression (3).
i. Construct a 95% confidence interval for the coefficient.
ii. Is the coefficient statistically significant? Explain.
iii. Is the magnitude of the coefficient large? Explain.
iv. A family advocacy group notes that the large coefficient suggests that public policies
that encourage marriage will lead, on average, to healthier babies. Do you agree?
Discuss some of the various factors that Unmarried may be controlling for and how
this affects the interpretation of its coefficient.
f. Consider the various other control variables in the data set. Which do you think should be
included in the regression? Discuss the robustness of the confidence interval you
constructed in (b). What is a reasonable 95% confidence interval for the effect of smoking on
birth weight?
Problem #2:
Download the dataset Gender.dta from Canvas. This file contains fabricated data on gender, income,
education, and job experience. We will use this file to examine group differences in the parameters
of this model, this time using dummy variables and interaction effects.
a) Suppose we are interested in the relationship shared by income and both education and job
experience. In particular, suppose you want to know whether there are any differences in
the models for men and women. Estimate the following three models using dummy
variables and interaction effects (use Stata’s factor variable notation to do so):
a. Start with a model that assumes there are no differences by gender – that the
relationships between these variables are identical for men and women.
b. Next, estimate a model in which the constant terms differ by gender, but wherein
the effects of education and job experience are the same for both men and women.
c. Finally, estimate a model wherein both the constant terms and slopes differ by
gender. In this model, all parameters should be allowed to differ by gender.
b) Identify the “best” model of the three you estimated and explain why it was selected. Be sure
to discuss what insights the model provides concerning gender differences. To help you with
the discussion, run the following commands after your preferred model.
quietly margins female, at(educ=(0(1)20) jobexp=0)
marginsplot, noci ytitle(“Predicted Income”) ylabel(#10) scheme(sj) name(educ)
quietly margins female, at(jobexp=(0(1)20) educ=0)
marginsplot, noci ytitle(“Predicted Income”) ylabel(#10) scheme(sj) name(jobexp)

c) In the models you previously estimated, the relationship between the indicator variable for
being a female changes from negative to positive once interaction terms are included in the
model. Should this be a concern? Explain why or why not. Explain how the interpretation of
the coefficient associated with the binary variable Female changes once interaction terms
Problem #3:
This problem is inspired by a study of the gender gap in earnings in top corporate jobs. The study
compares total compensation among top executives in a large set of U.S. public corporations in the
1990s. (Each year these publicly traded corporations must report total compensation levels for their
top five executives.)
a. Let Female be an indicator variable that is equal to 1 for females and 0 for males. A
regression of the logarithmof earnings on Female yields
????(??????�??????????) = 6.48 − 0.44????????????; ?????? = 2.65
(0.01) (0.05)
i. The estimated coefficient on Female is−0.44−0.44. Explain what this value means.
ii. TheSERis 2.65. Explain what this value means.
iii. Does this regression suggest that female top executives earn less than top male
executives? Explain.
iv. Does this regression suggest that there is sex discrimination? Explain.
b. Two new variables, the market value of the firm (a measure of firm size, in millions of
dollars) and stock return (a measure of firm performance, in percentage points), are added
to the regression:
ln(??????�??????????) = 3.86 − 0.28 ???????????? + 0.37 ln(???????????? ??????????) + 0.0004 ????????????;
(0.03) (0.04) (0.004) (0.003)
??2 = 0.345, ?? = 46,670

i. The coefficient on ln(MarketValue) is 0.37. Explain what this value means.
ii. The coefficient on Female is now−0.28. Explain why it has changed from the
regression in (a).
c. Are large firms more likely than small firms to have female top executives? Explain.

Problem #4:
Suppose a researcher collects data on houses that have sold in a particular neighborhood over the
past year and obtains the regression results in the table shown below.
a. Using the results in column (1), what is the expected change in price of building a 500-
square-foot addition to a house? Construct a 95% confidence interval for the percentage
change in price.
b. Comparing columns (1) and (2), is it better to use Size or ln(Size) to explain house prices?
c. Using column (2), what is the estimated effect of a pool on price? (Make sure you get the
units right.) Construct a 95% confidence interval for this effect.
d. The regression in column (3) adds the number of bedrooms to the regression. How large is
the estimated effect of an additional bedroom? Is the effect statistically significant? Why do
you think the estimated effect is so small? (Hint: Which other variables are being held
constant?)
e. Is the quadratic termln(????????)2 important?
f. Use the regression in column (5) to compute the expected change in price when a pool is
added to a house that doesn’t have a view. Repeat the exercise for a house that has a view.
Is there a large difference? Is the difference statistically significant?

Problem #5:
On Canvas you will find a dataset titled CPS2015, which contains data for full-time, full-year
workers, ages 25–34, with a high school diploma or B.A./B.S. as their highest degree. A detailed
description is given in the file CPS2015_Description, also available on Canvas. In this exercise, you
will investigate the relationship between a worker’s age and earnings. (Generally, older workers
have more job experience, leading to higher productivity and higher earnings.)
a. Run a regression of average hourly earnings (AHE) on age (Age), sex (Female), and
education (Bachelor). If Age increases from 25 to 26, how are earnings expected to change?
If Age increases from 33 to 34, how are earnings expected to change?
b. Run a regression of the logarithm of average hourly earnings, ln(AHE), onAge, Female,
andBachelor. If Age increases from 25 to 26, how are earnings expected to change?
If Age increases from 33 to 34, how are earnings expected to change?
c. Run a regression of the logarithm of average hourly earnings, ln(AHE), on ln(Age), Female,
andBachelor. If Age increases from 25 to 26, how are earnings expected to change?
If Age increases from 33 to 34, how are earnings expected to change?
d. Run a regression of the logarithm of average hourly earnings, ln(AHE), onAge, Age2
, Female,
andBachelor. If Age increases from 25 to 26, how are earnings expected to change?
If Age increases from 33 to 34, how are earnings expected to change?
e. Do you prefer the regression in (c) to the regression in (b)? Explain.
f. Do you prefer the regression in (d) to the regression in (b)? Explain.
g. Do you prefer the regression in (d) to the regression in (c)? Explain.
h. Plot the regression relation between Age and ln(AHE) from (b), (c), and (d) for males with a
high school diploma. Describe the similarities and differences between the estimated
regression functions. Would your answer change if you plotted the regression function for
females with college degrees?
i. Run a regression of ln(AHE) onAge, Age2
, Female, Bachelor, and the interaction
termFemale x Bachelor. What does the coefficient on the interaction term measure? Alexis
is a 30-year-old female with a bachelor’s degree. What does the regression predict for her
value of ln(AHE)? Jane is a 30-year-old female with a high school diploma. What does the
regression predict for her value of ln(AHE)? What is the predicted difference between
Alexis’s and Jane’s earnings? Bob is a 30-year-old male with a bachelor’s degree. What does
the regression predict for his value of ln(AHE)? Jim is a 30-year-old male with a high school
diploma. What does the regression predict for his value of ln(AHE)? What is the predicted
difference between Bob’s and Jim’s earnings?
j. Is the effect of Age on earnings different for men than for women? Specify and estimate a
regression that you can use to answer this question.
k. Is the effect of Age on earnings different for high school graduates than for college
graduates? Specify and estimate a regression that you can use to answer this question.
l. After running all these regressions (and any others that you want to run), summarize the
effect of age on earnings for young workers. EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

E-mail: easydue@outlook.com  微信:easydue

EasyDue™是一个服务全球中国留学生的专业代写公司