这个作业是用R语言完成收益与年龄关系的统计调查

Homework 3
1)您已经了解到收入函数是以下方面调查最多的关系之一
经济学。这些通常将收益的对数与一系列解释变量相关联
例如教育程度,工作经验,性别,种族等。
(a)为什么您认为研究人员更喜欢对数线性规范而不是线性
规范?
(b)要建立年龄段资料,您对年龄的ln(Earn)进行回归,其中,Earn为每周
以美元为单位的收入,以年龄为单位的年龄。绘制针对年龄的回归残差
如图所示,共有1,744个人:
你有感觉到问题吗?
(c)考虑到对年龄段的了解,您决定允许回归线
40岁以下和40岁以上年龄段有所不同。因此,您创建了一个二进制变量,
Dage,年龄39岁及以下的人的值为1,否则为零。估计
收益方程式产生以下输出(使用异方差稳健性标准)
错误):
= 6.92 – 3.13×年龄– 0.019×年龄+ 0.085×(年龄×年龄),R2 = 0.20,SER = 0.721。
(38.33)(0.22)(0.004)(0.005)
绘制两条回归线:一条用于39岁及以下的年龄类别,另一条用于40岁及以下的年龄
以上。在年龄系数上加上负号是否有意义?预测
30岁和50岁的ln(收益)。这些之间的百分比差异是多少
二?
(d)斜率和截距相同的假设的F统计量为124.43。能够
你拒绝原假设吗?
(e)您还应考虑其他哪些功能形式?
2) After analyzing the age-earnings profile for 1,744 workers as shown in the figure, it becomes
clear to you that the relationship cannot be approximately linear.
You estimate the following polynomial regression model, controlling for the effect of gender by
using a binary variable that takes on the value of one for females and is zero otherwise:
= –795.90 + 82.93 × Age – 1.69 × Age2 + 0.015 × Age3 – 0.0005 × Age4 – 163.19 Female
(283.11) (29.29) (1.06) (0.016) (0.0009) (12.45)

R2 = 0.225, SER=259.78

(a) Test for the significance of the Age4 coefficient. Describe the general strategy to determine
the appropriate degree of the polynomial.
(b) You run two further regressions. Present an argument as to which one you should use for
further analysis.
= – 683.21 + 65.83 × Age – 1.05 × Age2 + 0.005 × Age3 – 163.23 Female
(120.13) (9.27) (0.22) (0.002) (12.45)

R2 = 0.225, SER=259.73
= – 344.88 + 41.48 × Age – 0.45× Age2 – 163.81 Female
(51.58) (2.64) (0.03) (12.47)

R2 = 0.222, SER=260.22

(c) Sketch the graph of fitted earnings of males against age of your preferred regression. Does
this make sense? Are you concerned about the negative coefficient on the regression intercept?
What is the implication for female earners in this sample?
(d) Explain how you would calculate the effect of changing age by one year on earnings, holding
constant the gender variable. Calculate the effect of change in age from 30 to 31 on earnings.