这个作业是用R语言评估工资中的性别歧视主张等。

HW 6

POL 850

Spring 2020

This homework is due by 5 PM on Friday May 8. Please note this is a HARD DEADLINE due to the

impending end of the semester. Please use this R Markdown template to report your code, ouput, and

written answers in a single document. You may also submit your R script, output, and typed written answers

separately. In either case, upload a single pdf of your final document to the Assignment portal on Classes.

Comment your code. Report results in the correct units of measurement.

Question 1

Example 1: You are part of a team that has been hired to assess claims of gender discrimination in wages in

a large social media firm. The CEO of the firm tells the team that the average wage in the firm is $800/week

(think of this as the population mean). You draw a random sample of 15 female employees and find that for

this sample, the average wage= $690 and standard deviation (s) = 90. Using these data, do the following:

1.1 Using the normal distribution, give the 95% confidence interval for the mean income of this sample of

female employees. Show your work in R.

#insert code here

1.2 Your null hypothesis is that the population mean wage of female employees in the firm is $800/week (the

same as the overall population mean wage). With 95% confidence, can you reject this null hypothesis? Why

or why not?

[Insert written answer here]

Example 2: Grades on a standardized test are known to have a mean of 1000 for students in the United

States (think of this as the population mean). The test is administered to 453 randomly selected students in

New York City; in this sample, the mean is 1013 and the standard deviation is 108.

1.3 The mayor of NYC argues that his education policy was effective since the sample mean test score of the

NYC students is higher than the national population mean. Using the normal distribution, construct a 95%

confidence interval for the sample mean test score for NYC students. Show your work in R.

#insert code here

1.4 Your null hypothesis is that the population mean test score of students in NYC is the same as the national

population mean test score (1000). With 95% confidence, can you reject this null hypothesis? Why or why

not?

[Insert written answer here]

1.5 Another 503 students are selected at random from NYC. They are given a three-hour preparation course

before the test is administered. Their average test score is 1019 with a standard deviation of 95. The prep

course provider then claims that their course makes a difference. Using the normal distribution, construct

the 95% confidence interval for the difference in test scores. Show your work in R.

#insert code here

1.6 Your null hypothesis is that the two sample mean test scores of students in NYC are the same. With 95%

confidence, can you reject this null hypothesis? Why or why not? What does this tell you about the claim

made by the test provider?

[insert written answer here]

1

Question 2

Example 1: A cable TV company conducted two surveys in 2019, one before (September) and one after

(November) public hearings in Congress, on public opinion over whether the President ought to be impeached

and removed from office.

Table 1: Public Opinion on Trump Impeachment

Month #Respondents Proportion Support Impeachment Proportion Oppose Impeachment

September 1,100 0.48 0.52

November 1,007 0.51 0.49

2.1 Using the normal distribution, calculate the 95 % confidence intervals for support for impeachment for

each survey. Show your work in R.

#insert code here

2.2 A news reporter from the cable TV company argues that the survey results indicate that public support

for the impeachment of President Trump increased after public hearings in Congress. You want to evaluate

this claim using the two confidence intervals you calculated in the previous question. What is your null

hypothesis? With 95% confidence, can you reject the null? Why or why not?

[insert written answer here]

Example 2: A researcher is concerned that Latino and non-Latino citizens experience different response

times to queries from local officials. She conducts an ‘audit’ experiment to see if this is true or not. In this

experiment, a (fake) email is sent from a citizen with a Latino or non-Latino sounding name, and the time

taken for the local government official to respond is recorded.

2.3 The researcher has limited resources. She sends 9 emails from a Latino name, and 14 emails from a

non-Latino name. For the Latino names, the mean response time was 421 minutes (standard deviation of

82 minutes). For the non-Latino names, mean response time was 366 minutes (standard deviation of 101

minutes). Calculate the difference in means and the standard error for the difference in means. Show your

work in R.

#insert code here

2.4 The researcher’s hypothesis is that emails from citizens with Latino sounding names will have longer

response times than emails from citizens with non-Latino sounding names. What is the null hypothesis?

What is the researcher’s directional alternative hypothesis?

[insert written answer here]

2.5 Calculate the value of the t-statistic for a one-tailed test. Show your work in R.

#insert code here

2.6 Using R’s pt() function, what is the probability that we would see a t-statistic that large or larger if the

null hypothesis were true, for a one-tailed test (remember to use the right degrees of freedom)? Can we reject

the null hypothesis at the 0.05 level for the one-tailed test? Can we reject the null hypothesis at the 0.05

level for a two-tailed test?

#insert code here

[insert written answer here]

2

Question 3

In an earlier homework, we analyzed data from an important field experiment by Devah Pager about the the

effect of race and criminal record on employment (“The Mark of a Criminal Record”). This is a follow-up

exercise using the same data set. Last time you described the different callback rates between groups. Now

we are going to use what we’ve learned about statistical inference to better understand those patterns. You

are welcome – and even encouraged – to reuse code from that exercise. In fact, in practice you often have to

work with the same dataset many times, and writing good code the first time helps you reuse the code in

future projects.

The dataset is called criminalrecord.csv. You may not need to use all of these variables for this activity.

We’ve kept these unnecessary variables in the dataset because it is common to receive a dataset with much

more information than you need.

Table 2: Criminal Record Dataset

Variable Description

jobid election year

callback 1 if tester received a callback, 0 if the tester did not receive a callback

black 1 if the tester is black, 0 if the tester is white

crimrec 1 if the tester has a criminal record, 0 if the tester does not

interact 1 if tester interacted with employer during the job application,

0 if tester does not interact with employer

city 1 is job is located in the city center, 0 if job is located in the suburbs

distance job’s average distance to downtown

custserv 1 if job is in the costumer service sector, 0 if it is not

manualskill 1 if job requires manual skills, 0 if it does not

This problem will give you practice with:

• re-using old code (optional)

• constructing confidence intervals

• difference-of-means tests

• p-values

• type I and type II errors

3.1 Begin by loading the data into R. How many cases are there in the data? In how many cases is the tester

black? In how many cases is he white?

#insert code here

[insert written answer here]

3.2 Now we examine the central question of the study. Calculate the proportions of callbacks for: white

applicants with a criminal record, white applicants without a criminal record, black applicants with a criminal

record, and black applicants without a criminal record.

#insert code here

3.3 Now consider the callback rate for white applicants with a criminal record. Using the normal distribution,

construct a 95% confidence interval around this estimate. Also, construct a 99% confidence interval around

this estimate (again using the normal distribution).

#insert code here

3

3.4 Calculate the estimated treatment effect of a criminal record for white applicants by subtracting from the

callback rate in the treatment condition (“having a criminal record ) from the callback rate in the control

condition (”no criminal record”), for white applicants. Using the normal distribution, create a 95% confidence

interval around this estimated treatment effect. Next, describe the estimated treatment effect and confidence

interval in a way that could be understood by a general audience.

#insert code here

[insert written answer here]

3.5 Assuming a null hypothesis that there is no difference in callback rates between white people with a

criminal record and white people without a criminal record, what is the probability that we would observe a

difference as large or larger than the one that we observed in a sample of this size? Use the normal distribution

and the correct probability for a two-tailed test.

#insert code here

3.6 Imagine that we set up an hypothesis test where the null hypothesis is that there is no difference in

callback rates between whites with and without a criminal record. In the context of this problem, what would

it mean to commit a type I error? In the context of this problem, what would it mean to commit a type II

error? If we set α = 0.05 for a two-tailed test, are we specifying the probability of type I error or type II

error?

[insert written answer here]

* Score per Question in Homework 6 *

Q1 Score Q2 Score Q3 Score

1 5 1 5 1 5

2 5 2 5 2 5

3 5 3 5 3 10

4 5 4 5 4 10

5 5 5 5 5 5

6 5 6 5 6 5

Total 30 30 40

4

EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

**E-mail:** easydue@outlook.com **微信:**easydue

**EasyDue™是一个服务全球中国留学生的专业代写公司
专注提供稳定可靠的北美、澳洲、英国代写服务
专注提供CS、统计、金融、经济、数学等覆盖100+专业的作业代写服务**