Problem set 5
This problem set is due on coursework on May 8th by Midnight. Please make sure to upload the file with your last name_first name_ ps5 (example: Superti_Chiara_ps5).
In the markdown document that you submit, you will have to include the following: all the R code used to answer the questions–combined with comments, figures, and other outputs – and the answers to all the questions. This can be in HTML or word format.
Discussion and the exchange of ideas are essential to doing academic work. For assignments in this course such as problem sets, you are encouraged to consult with your classmates as you work on problem sets. However, after discussions with peers, make sure that you can work through the problem on your own and ensure that any answers you submit for evaluation are the result of your own efforts. You also must list the names of students with whom you have collaborated on problem sets.
In addition, you must cite any books, articles, websites, lectures, etc that have helped you with your answers using appropriate citation practices.
QUESTION A: points 7
Researchers are interested in understanding the background of American politicians who are or have been elected to the US Congress. They gathered the list of nationally elected politicians from the Republican and Democratic Parties. Surveys done via phone were administered to a random sample of these past or current politicians across all States. Although none of the selected politicians in four Southern states and in four states in the Midwest answered the survey, researchers gathered information on 500 senators and house representatives across time.
- What is the population of interest for this research question? (0.5 points)
- They decide to test whether the average age of first-time elected politicians among US national politicians is the same as that of EU countries’ politicians, calculated as 45. What will be their null hypothesis? (0.5 points)
- Does the Central Limit Theorem apply to this sample? Why yes or why no? (1 point)
- If you answered yes, could you suggest to the researchers what other information they need to gather about the sample to be able to run the test of question A.2? If you answered no, could you tell the researchers what they should change to be able to apply the CLT. Moreover, could you tell them what other information they will need to gather to be able to run the test? (2 points)
- Researchers want to test the statement that states with higher economic development have had more women representatives in the last 10 years than those states with low economic development. How would you frame this theory in a causal way? How would you test it? What hypothesis would you test? (3 points)
QUESTION B: points 6
A researcher wants to study the number of asylum seekers from Syria, across the World in 2018. She uses the data from UNHCR available here (you should take a minute to take a look at the website to see where the data comes from).
From this data, the researcher discovers that the monthly mean of asylum applications lodged in 38 European countries and 6 non-European is 229.8, while the median is 52.
- Why is the median so much lower than the mean? What other descriptive statistics we discussed in class could have given you similar information? (2 points)
- Belgium’s monthly mean of Asylum seekers is 281, Germany’s is 2750, the United States’ one is 23. The standard deviations are: Belgium 81, Germany 482, United States 6. Write a brief paragraph explaining what story the information can tell us about the three countries. (2 points)
- The researcher claims that well-established democracies are more likely to accept applications for asylum seekers. Could you test this theory with this data? If so, how would you test it? If not, what kind of data would you like to have and how would you use it to test it? (2 points)
QUESTION C: points 12
Please report all your own R code used. Download from Courseworks the dataset named “ps1_Titanic.csv” load it into R and take a look at it. Using R, calculate and report how many observations (i.e., rows) this dataset has and how many variables (i.e., columns). The data contained information about sex, age, traveling class, fare paid for tickets and whether the passenger survived or not. The passengers are identified with an id.
- Look at the variables contained in this dataset (excluding the id). Choose and calculate one summary statistic (mean, median or mode) for each and explain your choice. (5 points)
- Calculate and report the variance and standard deviation of age of passengers (1 point)
- Pick other two variables (not age) from the dataset and plot a histogram and a barplot. Rename the axis in both graphs with a label of your choice. Report these figures in your write up. (2 points)
- Calculate the Standard Error of the mean of the age of passengers. How is this value different from what you calculate in 2? What does this number represent? (Explain with your own words) (2 points).
- What are the maximum and minimum fares paid by the passengers? In which class did these passengers travelled? (2 points)
QUESTION A: 11 points
Using the data from PART I and assuming that this is a random sample of all the passengers of the Titanic, answer the following questions.
- A group of researchers would like to understand more about the differences between passengers from the different classes. First use R to list the classes that passengers could travel in on the Titanic. (.5 points)
- What is the median age of passengers in each of the classes? (1 point)
- What is the mean fare paid by passengers in each class? (1 point)
- Plot the distributions (density plots) of fare by class. (1.5 points)
- Please test (using the R t.test command) the hypothesis that the mean of fare paid by women is different from the mean fare paid by men. What is your finding? Please present the conclusions for each of the three methods mentioned in class. Are you surprised by your findings? Why or why not? (3 points)
- Some scholars think that individuals with lower socio-economic status were given the possibility of escaping the Titanic later than passengers with higher socio-economic status. How could you test this statement? (Note that you do not have a direct measure of socio-economic status in your dataset so you will have to think about the next best way to approach this question with the data you have.) Set the null hypothesis, alternative hypothesis, run the test in R and present the conclusions for each of the three methods mentioned in class (i.e., rejection area, confidence interval, and p value) (4 points)
QUESTION B: 12 points
You were assigned the job of investigating the effect of the government’s distribution of post-flooding aid relief in Colombia on citizens’ perception of the Colombian government. You intend to use pamphlets describing the amount of aid distributed in various cities and mail them to different households in the country. Before (and after) that, you will run surveys both in the households that are expected to receive the pamphlet and those that are not.
- What specific output (y) variable do you think you should measure? Think about what factors you think could or should be influenced by this program. (1 point)
- What is the “treatment” in this context? And how would you select a control and treatment group? (2 points)
- Why should you check the balance of your control and treatment group pre-treatment? What does it mean to “check the balance” and which variables/features of the household would you like to check? Since you have limited resources, please pick 5 characteristics you would like to check the balance of. (2.5 points)
- When checking the balance, what null hypotheses are you setting for each comparison (give two examples)? Do you want to reject or fail to reject them? (1.5 points)
- What does a problem of non-compliance look like in your case? Do you think you would be likely to run into this problem? If you do, what could you do? (2.5 points)
- Are you worried about spill-over effects? How would they look in your case and what could you do to avoid them? (hint: think about some of the examples we saw in class) (2.5 points)
QUESTION A (11 points)
Use the same dataset called WDI_Data.csv. Read the explanation of the variables in the word document called WDI Data Documentation.
- Is this a panel data? Why or why not? (.5 point)
- How many years are present in the study? From when to when (i.e., range of years)? (.5 point)
- Run a bivariate linear regression where your dependent variable is GDP per capita and the independent variable is the Number of radios per 1000 individuals. Show the code and the output. Please interpret your (beta) coefficient and its significance level. (2 points)
- Draw a plot to show how the dependent variable and explanatory variable change across time (you can do this in two separate graphs). (2 points)
- Add unit of analysis fixed effects and time fixed effects (for all time periods) to your model from question 3. Show the code you used and the output from R, but please do not report the coefficients of all the fixed effects. (2 points).
- Is the new coefficient significant? Is it the same as in answer 3? How has your analysis changed from 3 to 5? (i.e. what do the fixed effects used in the model of question 5 capture in this case?) (3 points)
QUESTION B (6 points)
You are hired to come up with the best model to predict countries’ most effective responses to COVID 19 (an effective approach limits the number of deaths).
- From the list below pick what you would choose for your predictive model (there is no limit on the number) (1 points)
– Presidential-Parliamentary systems
– Level of federalism
– Partisanship of government
– Number of Doctors per capita
– Public or Private health system
– Imports amount
– Exports amount
– Membership of multilateral organization
– Age population
– Average temperature
– Other (which?)
- Imagine that now you have to identify the impact of partisanship of the government on effectiveness of the response. Would you use the same criteria to select your variables given than now you are trying to identify causality? Why or why not? Can you give an example of a variable you would use in 1 but not here? (5 points)
QUESTION C. (12 points)
- 1. You want to study the attitude of college students in New York City toward President Donald Trump and you run a survey on three university campuses. Read the following questions for the survey
QUESTIONS FROM THE (FAKE) SURVEY:
“Do you agree or disagree with the following statement: College students are more liberal than the average American voter?”
- Highly agree
- Highly disagree
“Do you agree or disagree with the following statement: some experts claim that the response to COVID 19 of the United States government was not as effective as that of other governments”
- Highly agree
- Highly disagree
“What is your level of disapproval of the leadership of President Trump in the last 4 years?”
- Highly approve
- Highly disapprove
- Explain what are the biases listed below and why the survey could suffer from each of them (6 points)
- Social desirability bias
- Framing bias
- Order bias
- If you had to create a better version of these questions to get to a less biased understanding of the attitudes of college students toward the President of the United States, what would you do? Please try to rewrite the questions with the improvements you think are necessary for each bias (6 points)
EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!
E-mail: firstname.lastname@example.org 微信:easydue