这次作业是对抽烟者进行调查并分析相关问题的统计代写,作业详情可咨询客服
ECON 400: Introduction to Econometrics
Problem Set #4
Problem #1:
Use the Birthweight_Smoking data set available on Canvas to answer the following questions. To begin, run three regressions:
1. Birthweight on Smoker
2. Birthweight on Smoker, Alcohol, and Nprevist
3. Birthweight on Smoker, Alcohol, Nprevist, and Unmarried
a. What is the value of the estimated effect of smoking on birth weight in each of the regressions?
b. Construct a 95% confidence interval for the effect of smoking on birth weight, using each of the regressions.
c. Does the coefficient on Smoker in regression (1) suffer from omitted variable bias? Explain.
d. Does the coefficient on Smoker in regression (2) suffer from omitted variable bias? Explain.
e. Consider the coefficient onUnmarried in regression (3).
i. Construct a 95% confidence interval for the coefficient.
ii. Is the coefficient statistically significant? Explain.
iii. Is the magnitude of the coefficient large? Explain.
iv. A family advocacy group notes that the large coefficient suggests that public policies that encourage marriage will lead, on average, to healthier babies. Do you agree?
Discuss some of the various factors that Unmarried may be controlling for and how this affects the interpretation of its coefficient.
f. Consider the various other control variables in the data set. Which do you think should be included in the regression? Discuss the robustness of the confidence interval you constructed in (b). What is a reasonable 95% confidence interval for the effect of smoking on birth weight?
Problem #2:
Download the dataset Gender.dta from Canvas. This file contains fabricated data on gender, income,education, and job experience. We will use this file to examine group differences in the parameters of this model, this time using dummy variables and interaction effects.
a) Suppose we are interested in the relationship shared by income and both education and job experience. In particular, suppose you want to know whether there are any differences in the models for men and women. Estimate the following three models using dummy variables and interaction effects (use Stata’s factor variable notation to do so):
a. Start with a model that assumes there are no differences by gender – that the relationships between these variables are identical for men and women.
b. Next, estimate a model in which the constant terms differ by gender, but wherein the effects of education and job experience are the same for both men and women.
c. Finally, estimate a model wherein both the constant terms and slopes differ by gender. In this model, all parameters should be allowed to differ by gender.
b) Identify the “best” model of the three you estimated and explain why it was selected. Be sure to discuss what insights the model provides concerning gender differences. To help you with the discussion, run the following commands after your preferred model.quietly margins female, at(educ=(0(1)20) jobexp=0) marginsplot, noci ytitle(“Predicted Income”) ylabel(#10) scheme(sj) name(educ) quietly margins female, at(jobexp=(0(1)20) educ=0) marginsplot, noci ytitle(“Predicted Income”) ylabel(#10) scheme(sj) name(jobexp)
c) In the models you previously estimated, the relationship between the indicator variable for being a female changes from negative to positive once interaction terms are included in the model. Should this be a concern? Explain why or why not. Explain how the interpretation of the coefficient associated with the binary variable Female changes once
interaction terms are added to the model.
Problem #3:
This problem is inspired by a study of the gender gap in earnings in top corporate jobs. The study compares total compensation among top executives in a large set of U.S. public corporations in the 1990s. (Each year these publicly traded corporations must report total compensation levels for their top five executives.)
a. Let Female be an indicator variable that is equal to 1 for females and 0 for males. A regression of the logarithmof earnings on Female yields ????(??????�??????????) = 6.48 − 0.44????????????; ?????? = 2.65 (0.01) (0.05)
i. The estimated coefficient on Female is−0.44−0.44. Explain what this value means.
ii. TheSERis 2.65. Explain what this value means.
iii. Does this regression suggest that female top executives earn less than top male executives? Explain.
iv. Does this regression suggest that there is sex discrimination? Explain.
b. Two new variables, the market value of the firm (a measure of firm size, in millions of dollars) and stock return (a measure of firm performance, in percentage points), are added to the regression:
ln(??????�??????????) = 3.86 − 0.28 ???????????? + 0.37 ln(???????????? ??????????) + 0.0004 ????????????;
(0.03) (0.04) (0.004) (0.003)
??2 = 0.345, ?? = 46,670
i. The coefficient on ln(MarketValue) is 0.37. Explain what this value means.
ii. The coefficient on Female is now−0.28. Explain why it has changed from the regression in (a).
c. Are large firms more likely than small firms to have female top executives? Explain.
Problem #4:
Suppose a researcher collects data on houses that have sold in a particular neighborhood over the past year and obtains the regression results in the table shown below.
a. Using the results in column (1), what is the expected change in price of building a 500-square-foot addition to a house? Construct a 95% confidence interval for the percentage change in price.
b. Comparing columns (1) and (2), is it better to use Size or ln(Size) to explain house prices?
c. Using column (2), what is the estimated effect of a pool on price? (Make sure you get the units right.) Construct a 95% confidence interval for this effect.
d. The regression in column (3) adds the number of bedrooms to the regression. How large is the estimated effect of an additional bedroom? Is the effect statistically significant? Why do you think the estimated effect is so small? (Hint: Which other variables are being held constant?)
e. Is the quadratic termln(????????)2 important?
f. Use the regression in column (5) to compute the expected change in price when a pool is added to a house that doesn’t have a view. Repeat the exercise for a house that has a view.
Is there a large difference? Is the difference statistically significant?
Problem #5:
On Canvas you will find a dataset titled CPS2015, which contains data for full-time, full-year workers, ages 25–34, with a high school diploma or B.A./B.S. as their highest degree. A detailed description is given in the file CPS2015_Description, also available on Canvas. In this exercise, you will investigate the relationship between a worker’s age and earnings. (Generally, older workers have more job experience, leading to higher productivity and higher earnings.)
a. Run a regression of average hourly earnings (AHE) on age (Age), sex (Female), and education (Bachelor). If Age increases from 25 to 26, how are earnings expected to change?
If Age increases from 33 to 34, how are earnings expected to change?
b. Run a regression of the logarithm of average hourly earnings, ln(AHE), onAge, Female, andBachelor. If Age increases from 25 to 26, how are earnings expected to change?
If Age increases from 33 to 34, how are earnings expected to change?
c. Run a regression of the logarithm of average hourly earnings, ln(AHE), on ln(Age), Female, andBachelor. If Age increases from 25 to 26, how are earnings expected to change?
If Age increases from 33 to 34, how are earnings expected to change?
d. Run a regression of the logarithm of average hourly earnings, ln(AHE), onAge, Age2, Female,andBachelor. If Age increases from 25 to 26, how are earnings expected to change?
If Age increases from 33 to 34, how are earnings expected to change?
e. Do you prefer the regression in (c) to the regression in (b)? Explain.
f. Do you prefer the regression in (d) to the regression in (b)? Explain.
g. Do you prefer the regression in (d) to the regression in (c)? Explain.
h. Plot the regression relation between Age and ln(AHE) from (b), (c), and (d) for males with a high school diploma. Describe the similarities and differences between the estimated regression functions. Would your answer change if you plotted the regression function for females with college degrees?
i. Run a regression of ln(AHE) onAge, Age2, Female, Bachelor, and the interaction termFemale x Bachelor. What does the coefficient on the interaction term measure? Alexis is a 30-year-old female with a bachelor’s degree. What does the regression predict for her value of ln(AHE)? Jane is a 30-year-old female with a high school diploma. What does the regression predict for her value of ln(AHE)? What is the predicted difference between Alexis’s and Jane’s earnings? Bob is a 30-year-old male with a bachelor’s degree. What does the regression predict for his value of ln(AHE)? Jim is a 30-year-old male with a high school diploma. What does the regression predict for his value of ln(AHE)? What is the predicted difference between Bob’s and Jim’s earnings?
j. Is the effect of Age on earnings different for men than for women? Specify and estimate a regression that you can use to answer this question.
k. Is the effect of Age on earnings different for high school graduates than for college graduates? Specify and estimate a regression that you can use to answer this question.
l. After running all these regressions (and any others that you want to run), summarize the effect of age on earnings for young workers.