这个作业是分析了一项研究的数据，评估“ Progresa”的选举影响

Homework 4

POL 850

Spring 2020

This homework is due by 5 PM on Friday, April 24. Please use this R Markdown template to report your

code, ouput, and written answers in a single document. You may also submit your R script, output, and

typed written answers separately. In either case, upload a single pdf of your final document to the Assignment

portal on Classes. Comment your code. Report results in the correct units of measurement. Do not report

more than two digits to the right of the decimal point.

Question 1: The Electoral Effects of Conditional Cash Transfers

In this exercise, we analyze the data from a study that estimated the electoral impacts of ‘Progresa’, Mexico’s

conditional cash transfer program (CCT program). The original study relied on a randomized evaluation

of the CCT program in which eligible villages were randomly assigned to receive the program either 21

(Early Progresa) or 6 months (Late Progresa) before the 2000 Mexican presidential election. The authors

hypothesized that the CCT program would mobilize voters, leading to both an increase in turnout and an

increase in support for the incumbent party (PRI in this case). The analysis was based on a sample of

precincts that contained at most one participating village in the evaluation. The data we analyze is available

as the CSV file progresa.csv. The names and descriptions of variables in the data set are listed in Table 1.

Question 1.1

First, create two new variables that measure a) the change in turnout between 1994 and 2000, as shares

of the voting eligible population (using t1994 and t2000), and b) the change in incumbent party (PRI)

support between 1994 and 2000, as shares of the voting eligible population (using pri2000s and pri1994s).

Then, estimate the impact of the earlier availability of the CCT program on the changes in turnout and

PRI support using two different strategies. First, construct difference-in-means estimators by comparing

the average changes in outcomes in the treated’ (Early *Progresa*) precincts versus the ones

observed incontrol’ (Late Progresa) precincts. Next, estimate these effects by regressing the outcome change

variables on the treatment variable. Interpret and compare the estimates under these approaches. Do the

results support the hypothesis? Provide a brief interpretation.

##insert code here

Insert written answer here.

Question 1.2

Now, fit a regression model for each outcome change variable that includes the average poverty level in a

precinct (avgpoverty), the total precinct population in 1994 (pobtot1994), the total number of voters who

turned out in the previous election (votos1994), and the total number of votes cast for each of the three

main competing parties in the previous election (pri1994 for PRI, pan1994 for Partido Acci’on Nacional

or PAN, and prd1994 for Partido de la Revolución Democrática or PRD). Use the same outcome change

variables as in the previous question. According to this model, what are the estimated average effects of the

program’s earlier availability on changes in turnout and support for the incumbent party? Are these results

different from what you obtained in the previous question?

## insert code here

1

Table 1: Variable descriptions in progresa.csv dataset

Variable Description

treatment Whether an electoral precinct contains a village where

households received Early *Progresa*

pri2000s PRI votes in the 2000 election as a share of precinct

population above 18

pri2000v Official PRI vote share in the 2000 election

t2000 Turnout in the 2000 election as a share of precinct

population above 18

t2000r Official turnout in the 2000 election

pri1994 Total PRI votes in the 1994 presidential election

pan1994 Total PAN votes in the 1994 presidential election

prd1994 Total PRD votes in the 1994 presidential election

pri1994s Total PRI votes in the 1994 election as a share of

precinct population above 18

pan1994s Total PAN votes in the 1994 election as a share of

precinct population above 18

prd1994s Total PRD votes in the 1994 election as a share of

precinct population above 18

pri1994v Official PRI vote share in the 1994 election

pan1994v Official PAN vote share in the 1994 election

prd1994v Official PRD vote share in the 1994 election

t1994 Turnout in the 1994 election as a share of precinct

population above 18

t1994r Official turnout in the 1994 election

votos1994 Total votes cast in the 1994 presidential election

avgpoverty Precinct Avg of Village Poverty Index

pobtot1994 Total Population in the precinct

villages Number of villages in the precinct

Insert written answer here.

Question 1.3

Some variables such as population or income are often skewed in their distributions, and don’t have nice

linear relationships with outcome variables that are not skewed. We often use the log transformations of such

variables in regression models so that we can estimate linear relationships between variables. To see this,

make a scatterplot with precinct population on the x axis, and turnout in 2000 as a share of population 18

and over on the y axis. Label the axes and give your plot a title. Does it look like there is a linear relationship

between population and turnout? Next, plot the natural logarithm transformation of precinct population, or

log(pobtot1994), on the x axis, and turnout in 2000 as a share of population 18 and over on the y axis.

Label the axes and give your plot a title. Does it look like there is a linear relationship between the log of

population and turnout? Also, in both graphs, do you notice anything unusual about the distribution of the

turnout variable?

## insert code here

Insert written answer here.

2

Question 1.4

Now, consider an alternative model specification. Use the same regression model as in Question 1.2, but

include the electoral variables in the previous election measured as shares of the voting age population

(t1994, pri1994s, pan1994s, and prd1994s) instead of measured in counts. In addition, include the natural

logarithm transformation of the precinct population variable instead of the raw population (simply include

log(pobtot1994) as a predictor in your regression). Are the results based on this new model specification

different from what we obtained in Question 1.2? If the results are different, which model fits the data better?

To compare the model fit, use adjusted R squares.

## insert code here

Insert written answer here.

3

EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

**E-mail:** easydue@outlook.com **微信:**easydue

**EasyDue™是一个服务全球中国留学生的专业代写公司
专注提供稳定可靠的北美、澳洲、英国代写服务
专注提供CS、统计、金融、经济、数学等覆盖100+专业的作业代写服务**