这个作业是用R语言评估工资中的性别歧视主张等。

HW 6
POL 850
Spring 2020
This homework is due by 5 PM on Friday May 8. Please note this is a HARD DEADLINE due to the
impending end of the semester. Please use this R Markdown template to report your code, ouput, and
written answers in a single document. You may also submit your R script, output, and typed written answers
separately. In either case, upload a single pdf of your final document to the Assignment portal on Classes.
Comment your code. Report results in the correct units of measurement.
Question 1
Example 1: You are part of a team that has been hired to assess claims of gender discrimination in wages in
a large social media firm. The CEO of the firm tells the team that the average wage in the firm is $800/week
(think of this as the population mean). You draw a random sample of 15 female employees and find that for
this sample, the average wage= $690 and standard deviation (s) = 90. Using these data, do the following:
1.1 Using the normal distribution, give the 95% confidence interval for the mean income of this sample of
female employees. Show your work in R.
#insert code here
1.2 Your null hypothesis is that the population mean wage of female employees in the firm is $800/week (the
same as the overall population mean wage). With 95% confidence, can you reject this null hypothesis? Why
or why not?
[Insert written answer here]
Example 2: Grades on a standardized test are known to have a mean of 1000 for students in the United
States (think of this as the population mean). The test is administered to 453 randomly selected students in
New York City; in this sample, the mean is 1013 and the standard deviation is 108.
1.3 The mayor of NYC argues that his education policy was effective since the sample mean test score of the
NYC students is higher than the national population mean. Using the normal distribution, construct a 95%
confidence interval for the sample mean test score for NYC students. Show your work in R.
#insert code here
1.4 Your null hypothesis is that the population mean test score of students in NYC is the same as the national
population mean test score (1000). With 95% confidence, can you reject this null hypothesis? Why or why
not?
[Insert written answer here]
1.5 Another 503 students are selected at random from NYC. They are given a three-hour preparation course
before the test is administered. Their average test score is 1019 with a standard deviation of 95. The prep
course provider then claims that their course makes a difference. Using the normal distribution, construct
the 95% confidence interval for the difference in test scores. Show your work in R.
#insert code here
1.6 Your null hypothesis is that the two sample mean test scores of students in NYC are the same. With 95%
confidence, can you reject this null hypothesis? Why or why not? What does this tell you about the claim
made by the test provider?
[insert written answer here]
1
Question 2
Example 1: A cable TV company conducted two surveys in 2019, one before (September) and one after
(November) public hearings in Congress, on public opinion over whether the President ought to be impeached
and removed from office.
Table 1: Public Opinion on Trump Impeachment
Month #Respondents Proportion Support Impeachment Proportion Oppose Impeachment
September 1,100 0.48 0.52
November 1,007 0.51 0.49
2.1 Using the normal distribution, calculate the 95 % confidence intervals for support for impeachment for
each survey. Show your work in R.
#insert code here
2.2 A news reporter from the cable TV company argues that the survey results indicate that public support
for the impeachment of President Trump increased after public hearings in Congress. You want to evaluate
this claim using the two confidence intervals you calculated in the previous question. What is your null
hypothesis? With 95% confidence, can you reject the null? Why or why not?
[insert written answer here]
Example 2: A researcher is concerned that Latino and non-Latino citizens experience different response
times to queries from local officials. She conducts an ‘audit’ experiment to see if this is true or not. In this
experiment, a (fake) email is sent from a citizen with a Latino or non-Latino sounding name, and the time
taken for the local government official to respond is recorded.
2.3 The researcher has limited resources. She sends 9 emails from a Latino name, and 14 emails from a
non-Latino name. For the Latino names, the mean response time was 421 minutes (standard deviation of
82 minutes). For the non-Latino names, mean response time was 366 minutes (standard deviation of 101
minutes). Calculate the difference in means and the standard error for the difference in means. Show your
work in R.
#insert code here
2.4 The researcher’s hypothesis is that emails from citizens with Latino sounding names will have longer
response times than emails from citizens with non-Latino sounding names. What is the null hypothesis?
What is the researcher’s directional alternative hypothesis?
[insert written answer here]
2.5 Calculate the value of the t-statistic for a one-tailed test. Show your work in R.
#insert code here
2.6 Using R’s pt() function, what is the probability that we would see a t-statistic that large or larger if the
null hypothesis were true, for a one-tailed test (remember to use the right degrees of freedom)? Can we reject
the null hypothesis at the 0.05 level for the one-tailed test? Can we reject the null hypothesis at the 0.05
level for a two-tailed test?
#insert code here
[insert written answer here]
2
Question 3
In an earlier homework, we analyzed data from an important field experiment by Devah Pager about the the
effect of race and criminal record on employment (“The Mark of a Criminal Record”). This is a follow-up
exercise using the same data set. Last time you described the different callback rates between groups. Now
we are going to use what we’ve learned about statistical inference to better understand those patterns. You
are welcome – and even encouraged – to reuse code from that exercise. In fact, in practice you often have to
work with the same dataset many times, and writing good code the first time helps you reuse the code in
future projects.
The dataset is called criminalrecord.csv. You may not need to use all of these variables for this activity.
We’ve kept these unnecessary variables in the dataset because it is common to receive a dataset with much
more information than you need.
Table 2: Criminal Record Dataset
Variable Description
jobid election year
callback 1 if tester received a callback, 0 if the tester did not receive a callback
black 1 if the tester is black, 0 if the tester is white
crimrec 1 if the tester has a criminal record, 0 if the tester does not
interact 1 if tester interacted with employer during the job application,
0 if tester does not interact with employer
city 1 is job is located in the city center, 0 if job is located in the suburbs
distance job’s average distance to downtown
custserv 1 if job is in the costumer service sector, 0 if it is not
manualskill 1 if job requires manual skills, 0 if it does not
This problem will give you practice with:
• re-using old code (optional)
• constructing confidence intervals
• difference-of-means tests
• p-values
• type I and type II errors
3.1 Begin by loading the data into R. How many cases are there in the data? In how many cases is the tester
black? In how many cases is he white?
#insert code here
[insert written answer here]
3.2 Now we examine the central question of the study. Calculate the proportions of callbacks for: white
applicants with a criminal record, white applicants without a criminal record, black applicants with a criminal
record, and black applicants without a criminal record.
#insert code here
3.3 Now consider the callback rate for white applicants with a criminal record. Using the normal distribution,
construct a 95% confidence interval around this estimate. Also, construct a 99% confidence interval around
this estimate (again using the normal distribution).
#insert code here
3
3.4 Calculate the estimated treatment effect of a criminal record for white applicants by subtracting from the
callback rate in the treatment condition (“having a criminal record ) from the callback rate in the control
condition (”no criminal record”), for white applicants. Using the normal distribution, create a 95% confidence
interval around this estimated treatment effect. Next, describe the estimated treatment effect and confidence
interval in a way that could be understood by a general audience.
#insert code here
[insert written answer here]
3.5 Assuming a null hypothesis that there is no difference in callback rates between white people with a
criminal record and white people without a criminal record, what is the probability that we would observe a
difference as large or larger than the one that we observed in a sample of this size? Use the normal distribution
and the correct probability for a two-tailed test.
#insert code here
3.6 Imagine that we set up an hypothesis test where the null hypothesis is that there is no difference in
callback rates between whites with and without a criminal record. In the context of this problem, what would
it mean to commit a type I error? In the context of this problem, what would it mean to commit a type II
error? If we set α = 0.05 for a two-tailed test, are we specifying the probability of type I error or type II
error?
[insert written answer here]
* Score per Question in Homework 6 *
Q1 Score Q2 Score Q3 Score
1 5 1 5 1 5
2 5 2 5 2 5
3 5 3 5 3 10
4 5 4 5 4 10
5 5 5 5 5 5
6 5 6 5 6 5
Total 30 30 40
4


EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

E-mail: easydue@outlook.com  微信:easydue


EasyDue™是一个服务全球中国留学生的专业代写公司
专注提供稳定可靠的北美、澳洲、英国代写服务
专注提供CS、统计、金融、经济、数学等覆盖100+专业的作业代写服务