Assignment should be typed. Note that you can copy & paste selected SAS syntax and SAS output into a Word document. For questions requiring use of SAS you must provide a copy of your SAS program (i.e. syntax) as well as the relevant output (and any requested additional commentary and hand-calculation). Show working and reasoning. Write proper sentences.
Total marks for each question are shown. Marks will be deducted for incorrect and incomplete answers, inadequate explanation, poor quality comment and interpretation sentences, and poor presentation.
Community survey dataset
This dataset (survey.sas7bdat) is the community survey dataset for both questions in this assignment.
value yesnof 0=’b No’ 1=’a Yes’;
value smokef 1=’d Never’ 2=’c Former’ 3=’b <15 cig/d’ 4=’a 15+ cig/d’;
value drinkf 1=’e Never’ 2=’d Former’ 3=’c <20gms/d’ 4=’b 20-60gms/d’ 5=’a >60gms/d’;
value sexf 0=’Male’ 1=’Female’;
Question 1 [11 marks] Use the community survey dataset to do the following.
- [1 mark] Use Proc GLM to fit a linear model that compares mean CHOL for adults with RXHYPER=yes vs no. Obtain the estimated difference in mean CHOL for adults with RXHYPER=yes vs no, its 95% confidence interval and the associated p-value. Write a sentence that includes and interprets these results.
- [3 marks] Investigate the impact of SEX on the relationship between RXHYPER and CHOL using two separate approaches:
- Consider SEX as a confounder of the relationship between RXHYPER and CHOL.
- Consider SEX as an effect modifier of the relationship between RXHYPER and CHOL.
For each approach, fit a single model and write down the algebraic representation of the fitted model, provide an interpretation of the parameter estimates from the fitted model and give an overall assessment of the impact of sex on the relationship between RXHYPER and CHOL.
- [2 marks] Examine the relationship between cholesterol (CHOL) and age (AGE) by using Proc GLM to fit a linear model for CHOL that allows a quadratic relationship with AGE (i.e. include AGE and AGE*AGE).
Write down the algebraic representation of the fitted (i.e. with estimated coefficients) quadratic relationship.
Provide evidence from your output as to whether you believe the relationship between cholesterol and age is curved or straight.
- [3 marks] Use Proc GLM to fit a single linear model for CHOL that allows separate quadratic relationships with AGE for men and women.
Write down the algebraic representation of the fitted quadratic relationships for men and women. Perform (and interpret) a test of whether the fitted quadratic curve for men is significantly different from the quadratic curve for women.
Obtain estimates (from the fitted model) of mean cholesterol for men and for women with age 40, 50, 60, and 70 years and describe the difference in estimated mean cholesterol for men and women at each age.
- [2 marks] Use Proc GLM to fit a linear model that tests whether there is a difference in mean CHOL for people with RXHYPER yes and no, after adjustment for sex and age (use your results from (b) and (d) to decide how to fully adjust for sex and age).
Describe how adjustment for sex and age has changed your results comparing mean CHOL in people with RXHYPER yes and no.
Question 2 [9 marks]
(a) [2 marks] Use Proc GLMSELECT to perform a stepwise (backward) search for predictors of BMI. Use the following list of potential predictors: sex, age, smoking, drinking, diabetes and exercise. In your analysis, consider sex, drinking, smoking and diabetes as categorical (i.e. class) variables and the remainder as quantitative variables. In your search for predictors consider main effects and interactions with sex (but do not consider squares of quantitative variables) and use p=0.05 criterion for dropping variables. Provide the output that shows the order in which terms were dropped and the output showing the fitted final model (i.e. its estimated coefficients).
(b) [3 marks] Use Proc GLM to fit the final model from GLMSELECT and obtain and interpret the estimated effect (on BMI) of each variable in the final model.
(c) [1 mark] Demonstrate which continuous predictor from the final model in (b) has the largest influence on BMI.
(c) [1 mark] Obtain and interpret a measure of how well this final model predicts BMI.
(d) [2 marks] Use the final model from (b) to obtain the 95% prediction interval for BMI for an individual with the following characteristics: sex=1, age=45, drinking=1, exercise=1, smoking=3, and diabetes=0. State any assumptions. Write a sentence that includes and interprets this prediction interval.
EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!
E-mail: firstname.lastname@example.org 微信:easydue