这个作业是分析了一项研究的数据,评估“ Progresa”的选举影响的R语言代写

 

Homework 4 POL 850 Spring 2020

This homework is due by 5 PM on Friday, April 24. Please use this R Markdown template to report your code, ouput, and written answers in a single document. You may also submit your R script, output, and typed written answers separately. In either case, upload a single pdf of your final document to the Assignment portal on Classes. Comment your code. Report results in the correct units of measurement. Do not report more than two digits to the right of the decimal point.

Question 1: The Electoral Effects of Conditional Cash Transfers
In this exercise, we analyze the data from a study that estimated the electoral impacts of ‘Progresa’, Mexico’s conditional cash transfer program (CCT program). The original study relied on a randomized evaluation of the CCT program in which eligible villages were randomly assigned to receive the program either 21 (Early Progresa) or 6 months (Late Progresa) before the 2000 Mexican presidential election. The authors hypothesized that the CCT program would mobilize voters, leading to both an increase in turnout and an increase in support for the incumbent party (PRI in this case). The analysis was based on a sample of precincts that contained at most one participating village in the evaluation. The data we analyze is available as the CSV file progresa.csv. The names and descriptions of variables in the data set are listed in Table 1.

Question 1.1
First, create two new variables that measure a) the change in turnout between 1994 and 2000, as shares of the voting eligible population (using t1994 and t2000), and b) the change in incumbent party (PRI) support between 1994 and 2000, as shares of the voting eligible population (using pri2000s and pri1994s).
Then, estimate the impact of the earlier availability of the CCT program on the changes in turnout and PRI support using two different strategies. First, construct difference-in-means estimators by comparing the average changes in outcomes in the treated’ (Early *Progresa*) precincts versus the ones observed incontrol’ (Late Progresa) precincts. Next, estimate these effects by regressing the outcome change variables on the treatment variable. Interpret and compare the estimates under these approaches. Do the results support the hypothesis? Provide a brief interpretation.
##insert code here
Insert written answer here.

Question 1.2
Now, fit a regression model for each outcome change variable that includes the average poverty level in a precinct (avgpoverty), the total precinct population in 1994 (pobtot1994), the total number of voters who turned out in the previous election (votos1994), and the total number of votes cast for each of the three main competing parties in the previous election (pri1994 for PRI, pan1994 for Partido Acci’on Nacional or PAN, and prd1994 for Partido de la Revolución Democrática or PRD). Use the same outcome change variables as in the previous question. According to this model, what are the estimated average effects of the program’s earlier availability on changes in turnout and support for the incumbent party? Are these results different from what you obtained in the previous question?
## insert code here

Table 1: Variable descriptions in progresa.csv dataset Variable Description treatment Whether an electoral precinct contains a village where households received Early *Progresa* pri2000s PRI votes in the 2000 election as a share of precinct population above 18 pri2000v Official PRI vote share in the 2000 election t2000 Turnout in the 2000 election as a share of precinct population above 18 t2000r Official turnout in the 2000 election
pri1994 Total PRI votes in the 1994 presidential election
pan1994 Total PAN votes in the 1994 presidential election
prd1994 Total PRD votes in the 1994 presidential election
pri1994s Total PRI votes in the 1994 election as a share of precinct population above 18
pan1994s Total PAN votes in the 1994 election as a share of precinct population above 18
prd1994s Total PRD votes in the 1994 election as a share of precinct population above 18
pri1994v Official PRI vote share in the 1994 election
pan1994v Official PAN vote share in the 1994 election
prd1994v Official PRD vote share in the 1994 election
t1994 Turnout in the 1994 election as a share of precinct population above 18
t1994r Official turnout in the 1994 election votos1994 Total votes cast in the 1994 presidential election avgpoverty Precinct Avg of Village Poverty Index pobtot1994 Total Population in the precinct villages Number of villages in the precinct
Insert written answer here.
Question 1.3
Some variables such as population or income are often skewed in their distributions, and don’t have nice linear relationships with outcome variables that are not skewed. We often use the log transformations of such variables in regression models so that we can estimate linear relationships between variables. To see this, make a scatterplot with precinct population on the x axis, and turnout in 2000 as a share of population 18 and over on the y axis. Label the axes and give your plot a title. Does it look like there is a linear relationship between population and turnout? Next, plot the natural logarithm transformation of precinct population, or log(pobtot1994), on the x axis, and turnout in 2000 as a share of population 18 and over on the y axis.
Label the axes and give your plot a title. Does it look like there is a linear relationship between the log of population and turnout? Also, in both graphs, do you notice anything unusual about the distribution of the turnout variable?
## insert code here
Insert written answer here.

Question 1.4
Now, consider an alternative model specification. Use the same regression model as in Question 1.2, but include the electoral variables in the previous election measured as shares of the voting age population (t1994, pri1994s, pan1994s, and prd1994s) instead of measured in counts. In addition, include the natural logarithm transformation of the precinct population variable instead of the raw population (simply include log(pobtot1994) as a predictor in your regression). Are the results based on this new model specification
different from what we obtained in Question 1.2? If the results are different, which model fits the data better?
To compare the model fit, use adjusted R squares.
## insert code here
Insert written answer here.