这个作业是用R语言评估工资中的性别歧视主张等。

 

HW 6 POL 850 Spring 2020
This homework is due by 5 PM on Friday May 8. Please note this is a HARD DEADLINE due to the impending end of the semester. Please use this R Markdown template to report your code, ouput, and written answers in a single document. You may also submit your R script, output, and typed written answers separately. In either case, upload a single pdf of your final document to the Assignment portal on Classes.

Comment your code. Report results in the correct units of measurement.
Question 1
Example 1: You are part of a team that has been hired to assess claims of gender discrimination in wages in a large social media firm. The CEO of the firm tells the team that the average wage in the firm is $800/week (think of this as the population mean). You draw a random sample of 15 female employees and find that for this sample, the average wage= $690 and standard deviation (s) = 90. Using these data, do the following:
1.1 Using the normal distribution, give the 95% confidence interval for the mean income of this sample of female employees. Show your work in R.
#insert code here
1.2 Your null hypothesis is that the population mean wage of female employees in the firm is $800/week (the same as the overall population mean wage). With 95% confidence, can you reject this null hypothesis? Why or why not?
[Insert written answer here]
Example 2: Grades on a standardized test are known to have a mean of 1000 for students in the United States (think of this as the population mean). The test is administered to 453 randomly selected students in New York City; in this sample, the mean is 1013 and the standard deviation is 108.
1.3 The mayor of NYC argues that his education policy was effective since the sample mean test score of the NYC students is higher than the national population mean. Using the normal distribution, construct a 95% confidence interval for the sample mean test score for NYC students. Show your work in R.
#insert code here
1.4 Your null hypothesis is that the population mean test score of students in NYC is the same as the national population mean test score (1000). With 95% confidence, can you reject this null hypothesis? Why or why not?
[Insert written answer here]
1.5 Another 503 students are selected at random from NYC. They are given a three-hour preparation course before the test is administered. Their average test score is 1019 with a standard deviation of 95. The prep course provider then claims that their course makes a difference. Using the normal distribution, construct the 95% confidence interval for the difference in test scores. Show your work in R.
#insert code here
1.6 Your null hypothesis is that the two sample mean test scores of students in NYC are the same. With 95% confidence, can you reject this null hypothesis? Why or why not? What does this tell you about the claim made by the test provider?
[insert written answer here]

Question 2
Example 1: A cable TV company conducted two surveys in 2019, one before (September) and one after
(November) public hearings in Congress, on public opinion over whether the President ought to be impeached and removed from office.
Table 1: Public Opinion on Trump Impeachment
Month #Respondents Proportion Support Impeachment Proportion Oppose Impeachment September 1,100 0.48 0.52 November 1,007 0.51 0.49
2.1 Using the normal distribution, calculate the 95 % confidence intervals for support for impeachment for each survey. Show your work in R. #insert code here
2.2 A news reporter from the cable TV company argues that the survey results indicate that public support for the impeachment of President Trump increased after public hearings in Congress. You want to evaluate this claim using the two confidence intervals you calculated in the previous question. What is your null hypothesis? With 95% confidence, can you reject the null? Why or why not?
[insert written answer here]
Example 2: A researcher is concerned that Latino and non-Latino citizens experience different response times to queries from local officials. She conducts an ‘audit’ experiment to see if this is true or not. In this experiment, a (fake) email is sent from a citizen with a Latino or non-Latino sounding name, and the time taken for the local government official to respond is recorded.
2.3 The researcher has limited resources. She sends 9 emails from a Latino name, and 14 emails from a non-Latino name. For the Latino names, the mean response time was 421 minutes (standard deviation of 82 minutes). For the non-Latino names, mean response time was 366 minutes (standard deviation of 101 minutes). Calculate the difference in means and the standard error for the difference in means. Show your work in R.
#insert code here
2.4 The researcher’s hypothesis is that emails from citizens with Latino sounding names will have longer response times than emails from citizens with non-Latino sounding names. What is the null hypothesis?
What is the researcher’s directional alternative hypothesis?  [insert written answer here]
2.5 Calculate the value of the t-statistic for a one-tailed test. Show your work in R. #insert code here
2.6 Using R’s pt() function, what is the probability that we would see a t-statistic that large or larger if the null hypothesis were true, for a one-tailed test (remember to use the right degrees of freedom)? Can we reject the null hypothesis at the 0.05 level for the one-tailed test? Can we reject the null hypothesis at the 0.05 level for a two-tailed test? #insert code here [insert written answer here]

Question 3
In an earlier homework, we analyzed data from an important field experiment by Devah Pager about the the effect of race and criminal record on employment (“The Mark of a Criminal Record”). This is a follow-up exercise using the same data set. Last time you described the different callback rates between groups. Now we are going to use what we’ve learned about statistical inference to better understand those patterns. You are welcome – and even encouraged – to reuse code from that exercise. In fact, in practice you often have to work with the same dataset many times, and writing good code the first time helps you reuse the code in future projects.
The dataset is called criminalrecord.csv. You may not need to use all of these variables for this activity.
We’ve kept these unnecessary variables in the dataset because it is common to receive a dataset with much more information than you need.
Table 2: Criminal Record Dataset Variable Description jobid election year callback 1 if tester received a callback, 0 if the tester did not receive a callback black 1 if the tester is black, 0 if the tester is white crimrec 1 if the tester has a criminal record, 0 if the tester does not interact 1 if tester interacted with employer during the job application, 0 if tester does not interact with employer city 1 is job is located in the city center, 0 if job is located in the suburbs distance job’s average distance to downtown custserv 1 if job is in the costumer service sector, 0 if it is not manualskill 1 if job requires manual skills, 0 if it does not
This problem will give you practice with:
• re-using old code (optional)
• constructing confidence intervals
• difference-of-means tests
• p-values
• type I and type II errors
3.1 Begin by loading the data into R. How many cases are there in the data? In how many cases is the tester black? In how many cases is he white?
#insert code here
[insert written answer here]
3.2 Now we examine the central question of the study. Calculate the proportions of callbacks for: white applicants with a criminal record, white applicants without a criminal record, black applicants with a criminal record, and black applicants without a criminal record.
#insert code here
3.3 Now consider the callback rate for white applicants with a criminal record. Using the normal distribution,construct a 95% confidence interval around this estimate. Also, construct a 99% confidence interval around this estimate (again using the normal distribution). #insert code here
3.4 Calculate the estimated treatment effect of a criminal record for white applicants by subtracting from the callback rate in the treatment condition (“having a criminal record ) from the callback rate in the control condition (”no criminal record”), for white applicants. Using the normal distribution, create a 95% confidence interval around this estimated treatment effect. Next, describe the estimated treatment effect and confidence interval in a way that could be understood by a general audience.
#insert code here
[insert written answer here]
3.5 Assuming a null hypothesis that there is no difference in callback rates between white people with a criminal record and white people without a criminal record, what is the probability that we would observe a difference as large or larger than the one that we observed in a sample of this size? Use the normal distribution and the correct probability for a two-tailed test.
#insert code here
3.6 Imagine that we set up an hypothesis test where the null hypothesis is that there is no difference in callback rates between whites with and without a criminal record. In the context of this problem, what would it mean to commit a type I error? In the context of this problem, what would it mean to commit a type II error? If we set α = 0.05 for a two-tailed test, are we specifying the probability of type I error or type II error?
[insert written answer here]
* Score per Question in Homework 6 *
Q1 Score Q2 Score Q3 Score
1 5 1 5 1 5
2 5 2 5 2 5
3 5 3 5 3 10
4 5 4 5 4 10
5 5 5 5 5 5
6 5 6 5 6 5
Total 30 30 40