


(a) 如果汽车每加仑行驶 45 英里或更多,则它被归类为“高效”汽车。说客汽车行业声称,至少 50% 的 2011 款车型是高效的。以 10% 的显着性水平对此声明进行测试。

(b) 一篇研究论文声称至少 70% 的出版书籍有超过 250页。以 5% 的显着性水平对该声明进行测试。


(c) 一份出版物声称 25% 的婴儿早产。进行测试此声明的显着性水平为 10%。 (见 Preemie 变量)
数据集:“Babysamp 98”

(d) 减肥产品的广告声称至少有 20% 的男性有体脂肪百分比大于 35。用 5% 的显着性水平测试此声明。


0 1 2 3 4
18 121 126 90 10

(a) 每天的平均事件数是多少?

(b) 为该平均值构建泊松分布。评论它的比较方式以上数据基于你自己的认知。

(c) 讨论数据可能不服从泊松分布的原因。至少包括你的答案中的一个例子。


DASL 网站提供了许多数据集,每个数据集包含多个变量。从这些数据集中选择任何一个变量,以便:

1) 数据集是来自总体(真实的或发明的)的随机横截面样本

Part 1

For the questions in Part I, please show how you can use both the normal distribution and the binomial distribution when the sample size is large enough.

(a) A car is classed as “highly efficient” if it gets 45 miles-per-gallon or more. A lobbyist for the car industry claims that at least 50% of model 2011 cars are highly efficient.Conduct a test of this claim with a level of significance of 10%.
Dataset: “All the efficiency”

(b) A research paper claims that at least 70% of published books have more than 250 pages. Conduct a test of this claim with a level of significance of 5%.

Dataset: “Amazon books”

(c) A publication claims that 25% of babies are born prematurely. Conduct a test of this claim with a level of significance of 10%. (see Preemie variable)
Dataset: “Babysamp 98”

(d) The advertising for a diet product claims that at least 20% of men have a body fat percentage greater than 35. Test this claim with a level of significance of 5%.
Dataset: “Bodyfat”

Consider the following data:

Number of accidents per day
0 1 2 3 4
18 121 126 90 10

(a) What is the average number of events per day?

(b) Construct a Poisson distribution for that average. Comment on how it compares with the above data based on your own perception.

(c) Discuss reasons why data may not follow the Poisson distribution. Include at least one example in your answer.

Part III

The DASL website provides many datasets, each containing data for a number of variables. Choose any one variable from across these datasets such that:

1) The dataset is a random cross-section sample from a population (real or invented)
2) The variable is numerical and it is meaningful to calculate its mean (the numbers are not just labels for categories and they are not ranks or dates)
3) It’s not one of the variables used for other questions in this assignment