**Q1. DATA SET ****WBC ****(Marks = 60 marks) **

In an imaging experiment 50 white blood cells (WBCs) from non-diseased and 50 white blood cells from diseased patient group were analysed.

On each WBC the following characteristics of the WBC image was measured

- eccen Cell eccentricity
- arn Cell area

- perin Perimeter of the cell
- soln

Solidity of the cell

- ext Extent of the cell

- diam Diameter of the cell

**Part A: **

**The aim of the study on WBCs is to test whether the two groups have identical population mean ****vectors. **

**NOTE: **Show your SAS code and relevant formulae, outputs, and interpretation.

**a) **Find the group specific mean vectors, and the variance-covariance and the correlation matrices. **(5 marks)**

**b) **State your null and alternative hypotheses, show the mathematical formulae for the appropriate test statistic and the formulation for finding the critical value and associated p values. **(5 marks)**

**c) **Carry out appropriate multivariate procedures to determine whether the WBC cells differ across the diseased and non-diseased populations. Show your SAS code and relevant formulae, outputs, and interpretation. **(15 marks)****d) **What underlying assumptions are involved in this test procedure above? **(5 marks)**

**e) **Now test which of the WBC characteristics differ significantly between the diseased cells compared to the non-diseased. Use the pooled variance of the relevant variables. **(5 marks)**

**f) **Obtain the 95% simultaneous confidence intervals of the differences, state your alpha value.

Create a table of the resultant confidence intervals. **(5 marks) **

**g) **Obtain the analogous Bonferroni 95% Confidence Intervals of the differences. Create a table of the resultant confidence intervals. **(5 marks) **

**h) **Upon which if any of the WBC measures do the disease and non-diseased cells differ significantly, based on your answers in f)-g)? **(5 marks) **

**Part B: **

**NOTE: **Show your SAS code and relevant formulae, outputs, and interpretation.

**a) **For each group, plot the pairwise 90% prediction ellipses using PROC CORR for the pair of variables most significantly and negatively correlated and the pair of variables most significantly positively correlated. **(5 marks)**

**b) **Produce plots to test for multivariate normality of your data for each group. Is the data normal? **(5 marks)**

**NOTE: **In all parts of the question ensure you show your SAS code and relevant formulae, outputs,and interpretation.

**Q2. DATA SET ****WBC ****(Marks = 70 marks) **

For the data set analysed in **Question 1 on WBCs: **

**For the diseased group: **

**a) **Perform a principal component analysis (PCA). Show your full SAS code and all SAS output **(5 ****marks)**

**b) **Give (write out) the formulation of the first 3 principal components Prin j, j=1, …, 3 (PC1,PC2, PC3). (**3 marks)**

**c) **Find the variance and the cumulative proportion explained by each of the full suite of principal components. **(3 marks)**

**d) Create the Principal Component Pattern Profile **plot and interpret all the Principal Components. Justify your answers carefully according to your **Principal Component Pattern ****Profile **plot. **(8 marks) **

**e) **How many principal components (PC’s) would you retain based on the scree plot? Justify your answer. (**2 marks)**

**f) **Perform formal statistical tests to ascertain the optimal number of principal components to retain. HINT: Test the significance of the “larger” components, that is, the components corresponding to the larger eigenvalues. (**4 marks) **

**g) **Construct the 95% CI for 1. Show your formula and working along with the result. **(2 marks) **

**h) **Construct the 95% CI for 2. Show your formula and working along with the result. **(2 marks)**

**i) **Which variables contribute the most to PC2? **(1 mark)** **For the non-diseased group: **

**a) **Perform a principal component analysis (PCA). Show your full SAS code and all output **(5 ****marks) **

**b) **Give (write out) the formulation of the first 3 principal components Prin j, j=1, …, 3 (PC1,PC2, PC3). (**3 marks)**

**c) **Find the variance and the cumulative proportion explained by each of the full suite principal components. **(3 marks)**

**d) Create the Principal Component Pattern Profile **plot and interpret all the Principal Components. Justify your answers carefully according to your **Principal Component Pattern ****Profile **plot. **(8 marks)**

**e) **How many principal components (PC’s) would you retain based on the scree plot? Justify your answer. (**2 marks)**

**f) **Perform formal statistical tests to ascertain the optimal number of principal components to retain. HINT: Test the significance of the “larger” components, that is, the components corresponding to the larger eigenvalues. (**4 marks)**

**g) **Construct the 95% CI for 1. Show your formula and working along with the result. **(2 marks) **

**h) **Construct the 95% CI for 2. Show your formula and working along with the result. **(2 marks)**

**i) **Which variables contribute the most to PC2? **(1 mark)**

**j) **Make comments about the differences and similarities between the PC analytic results based on the diseased and non-diseased PC pattern profiles and the first 2 PCs found. **(10 ****marks)**

**NOTE: **In all parts of the question ensure you show your SAS code and relevant formulae, working out, outputs, and write your conclusions and interpretation carefully.

**Q3. DATA SET ****TWIN****: (Marks = 25 marks) **

**A sample of identical twin’s personality traits (TCIs) as discussed in a psychometric case study ****were investigated. **

**A total of 30 twin pairs were questioned. The following questions were put to the twins. **

- X1: What is the level of Novelty Seeking (NS) you observe in your twin?
- X2: What level of Novelty Seeking NS) does your twin see in you?
- X3: What is the level of Harm Avoidance (HA) you observe in your twin?
- X4: What level of Harm Avoidance (HA) does your twin see in you?

**Responses were recorded on the five-point scale. **Responses included the following rank values

**None**of the trait in question**Very low**level of the trait in question**Some level**of the trait in question**A great deal**of the trait in question**Huge level**of the trait in question.**The aim of the study was to ascertain whether the twins accurately perceive/rank the NS and HA****levels of their twin.**

**a)**Perform the appropriate Hotelling’s T-squared test – show your formula, SAS code, SAS output, hypothesis tests being tested, test statistic and p value.**(10 marks)**

**b)**Provide via SAS the sample means and variances of the differences in responses between the twins**(5 marks).**

**c)**Does the first twin accurately perceive the level of NS or HA of the second twin? Justify your conclusion.**(10 marks)**

**NOTE: **In all parts of the question ensure you show your SAS code or IML code and relevant formulae, outputs, working and interpretation. Write your conclusions out carefully.

