这个作业是对卷烟价格上涨对吸烟者的影响进行统计分析

Econ 526 Final Empirical Project

说明:
这种经验性的实践将使您有机会估算不同烟气方程。
使用STATA的回归模型。我已经发布了smoke.dta,经过数据清理后的源数据
论文:
Kim DH。,Park,HJ。 (2021)“卷烟价格上涨对吸烟者吸烟行为的影响
和不吸烟者。”即将发布。
上面的论文估计了2015年卷烟价格上涨对吸烟结果的影响
在韩国人中。张贴在Blackboard上的草稿是同行评审前的版本(已提交的草稿,不是
最后)。如果您想了解该政策的一些背景知识,可以查看该草案。
(注意:我们不会复制论文中的相同结果)
在香烟价格之后,您将使用smoke.dta估算吸烟率和香烟消费量
在2015年1月有所增加。数据来源是韩国卫生专家组(KHP),
代表性面板调查,总共包含49,752项观察。分析的数据周期为
从2011年到2016年。2015年的数据代表价格上涨后的个人行为。
因此,围绕该政策,我们有四个时期(2011-2014)和两个时期(2015和2016)
实施。如您所料,分析本质上应该是面板数据分析,但是我们会分析
在重复的横截面框架中(您可以像以前一样简单地使用“ reg”命令)。虽然我们
无法跨时间追踪同一个人的行为(如面板数据分析),我们仍然可以
估计不同年份个体的不同反应。您应该显示您的回归输出
对于每个问题(如适用)。没有适当的STATA,将不会有任何学分提供答案
结果附后。
强烈建议您使用word / PDF文件提交此任务的答案。不要
尝试在纸上写下STATA表/结果。所有答案应合理清晰。
Variable Generation and Descriptive Statistics.
1. Generate the price increase variable to represent the periods after cigarette price increased in 2015
(year>=2015) Name it as “price”. Also, generate the log of household income variable. Finally, generate
the linear time trend variable that represents the time trends between 2011 through 2016; =1 in year 2011,
=2 in year 2012, …, =6 in year 2016. Name it as “trend”. (5 pts)
2. Print out the descriptive statistics for all variables in the data by smoking status. Then briefly explain
the result (Hint: use tabstat command with options as “by (smoke) statistics (mean sd n) longstub format
(%9.1g)”). (5 pts)
Analysis of Smoking Participation.
3. Run the standard OLS regression of smoke on price, marital status, age, educational level, working
status, chronic disease status, number of family members, (log) household income, drinking status, and
trend. Provide the interpretation of the estimated coefficients and the regression output. (10 pts)
4. Run the same regression above without the trend variable. How are they different? Do you think it
would be better to include the trend variable in the regression or not? (5 pts)
5. Test for the heteroskedasticity using the white test. If the heteroskedasticity exists, then use the robust
standard errors for the remaining questions. (5 pts)
3
6. Run the same regression for male and female samples respectively. Are they different from the
regression from question 3? Comment on it (Hint: use if option). (10 pts)
7. As the “smoke” variable is a binary dependent variable, the OLS above was the LPM. We can use the
Logit or Probit model to examine the probability of smoking. Run the Logit model with the same
specification in question 3. Then, report the average marginal effects of the coefficients and provide the
interpretation (Hint: use the margins, dydx(*) command). (10 pts)
8. Run the Probit model with the same specification as above and report the average marginal effects of
coefficients. Compare the reported average marginal effects of coefficients among LPM, Logit, and Probit
model and comment on it. (15 pts)