STAT5002 Introduction to Statistics – Semester 1, 2020
1. If I select a random household from Ames, estimate the probability that
(a) the selected household has a basement?
(b) the selected household has a pool?
(c) the selected household has a pool and a basement?
2. In this question consider the four variables SalePrice (Y ), Lot.Area (x1), Overall.Qual (x2) and
MS.SubClass (x3).
1
(a) Consider the four simple linear regression model:
Yij = 0 + 1x1i + ✏ij (1)
log(Yij ) = 0 + 1x1i + ✏ij (2)
Yij = 0 + 1 log(x1i) + ✏ij (3)
log(Yij ) = 0 + 1 log(x1i) + ✏ij (4)
where ✏ij ⇠ N(0, 2).
By considering some diagnostic plots and the coecient of determination, r2, explain which of
the four model is the best.
(b) Using only Y , x1, x2 and x3, what is the best (parsimonious) regression model that fits the data?
log(Yij ) = 0 + 1x1i + 2x2i + ✏ij (5)
assuming ✏ij ⇠ N(0, 2).
i. Write the fitted model for (5).
ii. Are there any outliers under model (5)?
iii. You inspect a property with a lot area of 10000 feet2 with and an overall quality rated as
“Excellent” using the same standard of rating in the Ames Housing data. What is your
expected sales price under model (5)?
2 EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

E-mail: easydue@outlook.com  微信:easydue

EasyDue™是一个服务全球中国留学生的专业代写公司