这次任务是对抽烟者进行调查并分析相关问题

ECON 400: Introduction to Econometrics

Problem Set #4

Due: 03/02/2020

Problem #1:

Use the Birthweight_Smoking data set available on Canvas to answer the following questions. To

begin, run three regressions:

1. Birthweight on Smoker

2. Birthweight on Smoker, Alcohol, and Nprevist

3. Birthweight on Smoker, Alcohol, Nprevist, and Unmarried

a. What is the value of the estimated effect of smoking on birth weight in each of the

regressions?

b. Construct a 95% confidence interval for the effect of smoking on birth weight, using each of

the regressions.

c. Does the coefficient on Smoker in regression (1) suffer from omitted variable bias? Explain.

d. Does the coefficient on Smoker in regression (2) suffer from omitted variable bias? Explain.

e. Consider the coefficient onUnmarried in regression (3).

i. Construct a 95% confidence interval for the coefficient.

ii. Is the coefficient statistically significant? Explain.

iii. Is the magnitude of the coefficient large? Explain.

iv. A family advocacy group notes that the large coefficient suggests that public policies

that encourage marriage will lead, on average, to healthier babies. Do you agree?

Discuss some of the various factors that Unmarried may be controlling for and how

this affects the interpretation of its coefficient.

f. Consider the various other control variables in the data set. Which do you think should be

included in the regression? Discuss the robustness of the confidence interval you

constructed in (b). What is a reasonable 95% confidence interval for the effect of smoking on

birth weight?

Problem #2:

Download the dataset Gender.dta from Canvas. This file contains fabricated data on gender, income,

education, and job experience. We will use this file to examine group differences in the parameters

of this model, this time using dummy variables and interaction effects.

a) Suppose we are interested in the relationship shared by income and both education and job

experience. In particular, suppose you want to know whether there are any differences in

the models for men and women. Estimate the following three models using dummy

variables and interaction effects (use Stata’s factor variable notation to do so):

a. Start with a model that assumes there are no differences by gender – that the

relationships between these variables are identical for men and women.

b. Next, estimate a model in which the constant terms differ by gender, but wherein

the effects of education and job experience are the same for both men and women.

c. Finally, estimate a model wherein both the constant terms and slopes differ by

gender. In this model, all parameters should be allowed to differ by gender.

b) Identify the “best” model of the three you estimated and explain why it was selected. Be sure

to discuss what insights the model provides concerning gender differences. To help you with

the discussion, run the following commands after your preferred model.

quietly margins female, at(educ=(0(1)20) jobexp=0)

marginsplot, noci ytitle(“Predicted Income”) ylabel(#10) scheme(sj) name(educ)

quietly margins female, at(jobexp=(0(1)20) educ=0)

marginsplot, noci ytitle(“Predicted Income”) ylabel(#10) scheme(sj) name(jobexp)

c) In the models you previously estimated, the relationship between the indicator variable for

being a female changes from negative to positive once interaction terms are included in the

model. Should this be a concern? Explain why or why not. Explain how the interpretation of

the coefficient associated with the binary variable Female changes once interaction terms

are added to the model.

Problem #3:

This problem is inspired by a study of the gender gap in earnings in top corporate jobs. The study

compares total compensation among top executives in a large set of U.S. public corporations in the

1990s. (Each year these publicly traded corporations must report total compensation levels for their

top five executives.)

a. Let Female be an indicator variable that is equal to 1 for females and 0 for males. A

regression of the logarithmof earnings on Female yields

𝑙𝑙𝑙𝑙(𝐸𝐸𝐸𝐸𝐸𝐸�𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸) = 6.48 − 0.44𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹; 𝑆𝑆𝑆𝑆𝑆𝑆 = 2.65

(0.01) (0.05)

i. The estimated coefficient on Female is−0.44−0.44. Explain what this value means.

ii. TheSERis 2.65. Explain what this value means.

iii. Does this regression suggest that female top executives earn less than top male

executives? Explain.

iv. Does this regression suggest that there is sex discrimination? Explain.

b. Two new variables, the market value of the firm (a measure of firm size, in millions of

dollars) and stock return (a measure of firm performance, in percentage points), are added

to the regression:

ln(𝐸𝐸𝐸𝐸𝐸𝐸�𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸) = 3.86 − 0.28 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 + 0.37 ln(𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉) + 0.0004 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅;

(0.03) (0.04) (0.004) (0.003)

𝑅𝑅2 = 0.345, 𝑛𝑛 = 46,670

i. The coefficient on ln(MarketValue) is 0.37. Explain what this value means.

ii. The coefficient on Female is now−0.28. Explain why it has changed from the

regression in (a).

c. Are large firms more likely than small firms to have female top executives? Explain.

Problem #4:

Suppose a researcher collects data on houses that have sold in a particular neighborhood over the

past year and obtains the regression results in the table shown below.

a. Using the results in column (1), what is the expected change in price of building a 500-

square-foot addition to a house? Construct a 95% confidence interval for the percentage

change in price.

b. Comparing columns (1) and (2), is it better to use Size or ln(Size) to explain house prices?

c. Using column (2), what is the estimated effect of a pool on price? (Make sure you get the

units right.) Construct a 95% confidence interval for this effect.

d. The regression in column (3) adds the number of bedrooms to the regression. How large is

the estimated effect of an additional bedroom? Is the effect statistically significant? Why do

you think the estimated effect is so small? (Hint: Which other variables are being held

constant?)

e. Is the quadratic termln(𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆)2 important?

f. Use the regression in column (5) to compute the expected change in price when a pool is

added to a house that doesn’t have a view. Repeat the exercise for a house that has a view.

Is there a large difference? Is the difference statistically significant?

Problem #5:

On Canvas you will find a dataset titled CPS2015, which contains data for full-time, full-year

workers, ages 25–34, with a high school diploma or B.A./B.S. as their highest degree. A detailed

description is given in the file CPS2015_Description, also available on Canvas. In this exercise, you

will investigate the relationship between a worker’s age and earnings. (Generally, older workers

have more job experience, leading to higher productivity and higher earnings.)

a. Run a regression of average hourly earnings (AHE) on age (Age), sex (Female), and

education (Bachelor). If Age increases from 25 to 26, how are earnings expected to change?

If Age increases from 33 to 34, how are earnings expected to change?

b. Run a regression of the logarithm of average hourly earnings, ln(AHE), onAge, Female,

andBachelor. If Age increases from 25 to 26, how are earnings expected to change?

If Age increases from 33 to 34, how are earnings expected to change?

c. Run a regression of the logarithm of average hourly earnings, ln(AHE), on ln(Age), Female,

andBachelor. If Age increases from 25 to 26, how are earnings expected to change?

If Age increases from 33 to 34, how are earnings expected to change?

d. Run a regression of the logarithm of average hourly earnings, ln(AHE), onAge, Age2

, Female,

andBachelor. If Age increases from 25 to 26, how are earnings expected to change?

If Age increases from 33 to 34, how are earnings expected to change?

e. Do you prefer the regression in (c) to the regression in (b)? Explain.

f. Do you prefer the regression in (d) to the regression in (b)? Explain.

g. Do you prefer the regression in (d) to the regression in (c)? Explain.

h. Plot the regression relation between Age and ln(AHE) from (b), (c), and (d) for males with a

high school diploma. Describe the similarities and differences between the estimated

regression functions. Would your answer change if you plotted the regression function for

females with college degrees?

i. Run a regression of ln(AHE) onAge, Age2

, Female, Bachelor, and the interaction

termFemale x Bachelor. What does the coefficient on the interaction term measure? Alexis

is a 30-year-old female with a bachelor’s degree. What does the regression predict for her

value of ln(AHE)? Jane is a 30-year-old female with a high school diploma. What does the

regression predict for her value of ln(AHE)? What is the predicted difference between

Alexis’s and Jane’s earnings? Bob is a 30-year-old male with a bachelor’s degree. What does

the regression predict for his value of ln(AHE)? Jim is a 30-year-old male with a high school

diploma. What does the regression predict for his value of ln(AHE)? What is the predicted

difference between Bob’s and Jim’s earnings?

j. Is the effect of Age on earnings different for men than for women? Specify and estimate a

regression that you can use to answer this question.

k. Is the effect of Age on earnings different for high school graduates than for college

graduates? Specify and estimate a regression that you can use to answer this question.

l. After running all these regressions (and any others that you want to run), summarize the

effect of age on earnings for young workers.

EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

**E-mail:** easydue@outlook.com **微信:**easydue

**EasyDue™是一个服务全球中国留学生的专业代写公司
专注提供稳定可靠的北美、澳洲、英国代写服务
专注提供CS、统计、金融、经济、数据科学专业的作业代写服务**