本次澳洲统计代写作业案例分享主要是线性回归相关的问题

RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS REGRESSION MODELLING STAT2008/STAT2014/STAT4038/STAT6014/STAT6038
Assignment 1 (Total Marks: 50)
Submit by 5pm on Tuesday 20 Apr 2021

问题1 [13分]

众所周知,b0和b1分别是简单线性回归模型的未知参数β0和β1的最小二乘估计。在这个问题中,我们将从理论和数值模拟两方面研究b0和b1之间的相关性。提供的仿真代码如下:

#将您的Uni ID号设置为随机数生成器的种子。
#例如,如果您的Uni ID为u1234567,则使用
#set.seed(1234567)

x <-1:10
n <-长度(x)
估计<-矩阵(NA,1000,2)
名称(估计)<-c(“ b0”,“ b1”)
for(1:1000中的r){

y <-1 + 2 * x + rmrm(n,0,2)

估计值[r,] <-lm(y〜x)$系数
}

(a)[4分]显示b0和b1的协方差:Cov(b0,b1)=-Sσ。

请注意,您不能使用第6周引入的矩阵方法。

(b)[4分]写下真实模型和模拟中使用的误差项的分布。基于此模型,计算b0和b1的理论协方差和相关值。

(c)[2分]首先将您的Uni ID号设置为随机数生成器的种子,例如,如果您的Uni ID为u1234567,请运行set.seed(1234567)。然后运行模拟。根据模拟输出绘制b1对b0的散点图。这些估计似乎相关吗?

(d)[3分](c)之后,计算b0和b1的经验协方差和相关值。将结果与(b)部分进行比较,您会注意到什么?

问题2 [37分]

数据文件“ mammal.csv”(可从Wattle下载)包含95种哺乳动物的平均质量(质量)(千克),每天的代谢率(Metab)kJ和平均寿命(年)。已经提出,代谢速率是物种寿命的最佳单一预测因子​​之一。

(a)[3分]对Metab作一个生命散点图,并目视检查是否有任何高杠杆观察。这些哺乳动物的名称和种类是什么?

(b)[2分]对于大多数观察结果,对生命与新陈代谢之间的关系做出评论。您可能需要调整x和y坐标范围。

(c)[5分]对Metab应用自然对数变换。然后通过在转换的Metab上回归生命来拟合简单的线性回归模型。提供拟合结果。然后进行模型诊断。提供适当的图并讨论有关模型假设和异常观察的发现。

(d)[4分]按照(c)部分中的模型,尝试对响应变量应用自然对数变换和平方根变换。借助散点图和样本相关性选择最佳模型。写下您选择的回归模型的数学形式。

(e)[4分]按照您在(d)部分中选择的模型

Question 1 [13 Marks]

As we know, b0 and b1 are the least squares estimators of the unknown parameters β0 and β1 of simply linear regression model, respectively. In this question, we will study the correlation between b0 and b1 both from theory and numerical simulations. The simulation codes are provided as follows:

    # Set your Uni ID number as the seed of random number generator.
    # For example,  if your Uni ID is u1234567, then use
    # set.seed(1234567)
    x <- 1:10
    n <- length(x)
    estimates <- matrix(NA, 1000, 2)
    names(estimates) <- c("b0","b1")
    for(r in 1:1000) {
      y <- 1 + 2*x + rnorm(n,0,2)
      estimates[r,] <- lm(y~x)$coefficients
    }

(a) [4 marks] Show the covariance of b0 and b1: Cov(b0,b1)=S σ.

Note that you cannot use the matrix approach introduced in week 6.

  1. (b)  [4 marks] Write down the true model and the distribution of the error terms used in the simulation. Based on this model, calculate the values of the theoretical covariance and correlation of b0 and b1.
  2. (c)  [2 marks] First set your Uni ID number as the seed of random number generator, e.g., if your Uni ID is u1234567, run set.seed(1234567). Then run the simulation. Make a scatterplot of b1 against b0 based on the simulation output. Do these estimates appear to be correlated?
  3. (d)  [3 marks] Following part (c), calculate the values of the empirical covariance and correlation of b0 and b1. Comparing the results with part (b), what do you notice?

Question 2 [37 Marks]

Data file “mammal.csv” (available to download from Wattle) contains the average mass (Mass) in kg, metabolic rate (Metab) kJ per day and average lifespan (Life) in years for 95 species for mammals. It has been suggested that metabolic rate is one of the best single predictor of species lifespan.

  1. (a)  [3 marks] Make a scatterplot of Life against Metab and visually check if there are any high leverage observations. What are the names and species of these mammals?
  2. (b)  [2 marks] Make a comment on the relationship between Life and Metab for the majority of observations. You may need to adjust the x and y coordinates ranges.
  3. (c)  [5 marks] Apply natural log transformation to Metab. Then fit a simple linear regression model by regressing Life on transformed Metab. Provide the fitted results. Then conduct model diagnostics. Provide the appropriate plots and discuss your findings regarding model assumptions and unusual observations.
  4. (d)  [4 marks] Following the model in part (c), experiment with applying natural log transformation and square root transformation to the response variable. Select a best model with the help of scatterplots and sample correlations. Write down the mathematical form of your selected regression model.
  5. (e)  [4 marks] Following your selected model in part (d), fit a simple linear regression model. Write down the fitted model by mathematical equation. Conduct model diagnostic, provide the appropriate plots and discuss related results.
  6. (f)  [3 marks] Interpret the estimated slope of the fitted model in part (e). Obtain a 95% confidence interval for the slope parameter.
  7. (g)  [5 marks] Using ANOVA approach to test whether the model in part (e) is signifi- cant. You need to write down the hypotheses, provide the ANOVA table. What is the test statistic, rejection region or p-value, and your conclusion associated with this test?
  8. (h)  [4 marks] With the model in part (e), find a 90% prediction interval for the lifespan in years of a mammal with the metabolic rate 8000 kJ per day. Interpret this interval.
  9. (i)  [7 marks] Kleiber’s law states that on average the metabolic rate of an animal species is proportional to its mass raised to the power of 3/4. Propose a simple linear regression model and appropriate hypotheses to check the adequacy of this theory and explain why. Using this dataset, fit the model and provide the fitted results. Then test your proposed hypotheses. What’s your conclusion?