MSCI527 SAS Programming for Business Analytics
Submission date: Monday 7th October 2019 (10:00 am)
The coursework consists of two questions.
Question 1 covers the input, manipulation and output of data including statistical analyses using SAS. It also requires you to expand upon what you have learned during the course by applying SAS procedures or formats that you may not have seen before. The final part of the question requires you to adapt your code using macros to allow for repetitive analysis.
Question 2 concerns a multiple linear regression modelling exercise.
- A short report for each question detailing the analyses carried out. Assume a technical audience. These do not require an Executive Summary, Contents or Appendices but should include a brief introduction, summary of the analyses carried out and a conclusion.
- A record of the SAS code used to create the output for each question. The code should be written in such a way that a SAS user familiar with the data can clearly follow the code you have written. You must submit the SAS code as a SAS program file.
A small Internet retailer has asked you to write a SAS program in order to analyse their laptop sales. In particular they are interested in the sales performance of their standard laptop AP3965 and the higher specification laptop AT3600. They have provided you with data covering the four-week period 1/10/2017 – 28/10/2017 contained within twelve datasets. For each week there are three datasets available in the files ‘transtime_y1718w#.csv’, ‘transdetail_y1718w#.csv’ and ‘price_y1718w#.csv’. y1718 denotes the sales year (i.e., 2017-18) and w# denotes the sales week in the data, here weeks 27, 28, 29 and 30. The contents of the files are described below in the order specified, X denotes that this variable is not contained in the dataset and the data are separated by commas in the datasets:
|Transaction ID||Unique transaction (sale) identification number||1||1|
|Date of Sale||Date of the transaction||2||X|
|Time of Sale||Time of the transaction||3||X|
|Laptop Model||Either AP3965 or AT3600||X||2|
|Units Sold||Number of laptops sold in the transaction||X||3|
|Warranty||Extended warranty for laptop(s) purchased.
0 = NO, 1 = YES.
|AP3965 Price||Sale price of AP3965 laptop||1|
|AT3600 Price||Sale price of AT3600 laptop||2|
|Warranty Price||Cost of extended warranty||3|
- The retailer wants to compare sales performance in weeks 27 and 28 (Period 1) to weeks 29 and 30 (Period 2).
- During both periods the sale price of the AT3600 laptop was £1,199.99.
- During Period 1 the sale price of the AP3965 laptop was £699.99. During Period 2 this laptop was available on special offer at £499.99.
- During both periods extended warranty cost £39.00 per laptop purchased. This price was the same for both laptop models.
- Laptops purchased before 3pm will be shipped that day. Laptops purchased after 3pm will be shipped the following day.
The retailer requires you to provide a short summary sales report which must contain the following:
- A comprehensive breakdown of revenues generated over the four week period.
- The proportion of transactions that qualify for ‘same day’ shipment.
- Basic summary statistics for the daily number of each laptop model sold in each period.
- Histograms of the daily number of each laptop model sold in each period.
- Tests to determine if there is a significant difference in average daily number of laptops sold in the two sales periods for each laptop model.
In addition, the retailer would like to be able to compare any two sales periods (i.e., any number of weeks from the historical data). They have weekly data from week 1 in sales year 2014-15 through to week 20 in sales year 2019-20. (Data not provided.) The data is stored in exactly the same format that you have already been supplied. They would like a generic program that will produce the output required for a)-e) above for any two specified periods.
- Use macros to amend your code so that it can produce the same analysis output for any two specified periods. It should only require the following inputs
- The location of the raw data
- The start and end of the two sales periods under consideration (sales year/week)
HINT: Assuming that the raw data was located in C:\laptopsales\, the minimal amount of information required for the original requested analysis above would be
Data location: C:\laptopsales\
Period 1 Start (year): 1718; Period 1 Start (week): 27
Period 1 End (year): 1718; Period 1 End (week): 28
Period 2 Start (year): 1718 Period 2 Start (week): 29
Period 2 End (year): 1718; Period 2 End (week): 30
HINT: To read/output in numeric data with a £ symbol use the SAS informat/format: NLMNLGBPw.d. See the SAS documentation for details.
An online consumer magazine provides up to date reviews on laptops. In their reviews, a score is produced by an expert reviewer which is based upon fixed characteristics of the laptop (screen size, weight, DVD re-writer, etc.), a series of benchmark tests on laptop performance and on a road-test of the laptop undertaken by the reviewer – to see how well the laptop performs day-to-day. Given that there are a large number of laptops on the market and the cost of a full professional review is expensive, the magazine would like to attempt to automate the review process. In doing so, the magazine aim to produce a pre-review metric based purely upon the laptop performance in the benchmark tests and on a select number of a laptop’s fixed characteristics. In order to develop the automatic metric, the magazine have provided you with data from the previous fifty laptop reviews, which is contained in the SAS dataset ‘laptoprev’. The data is as follows:
|Review||The number associated with the laptop review.|
|Score||The laptop score produced by the reviewer. The larger its value the better the laptop.|
|GHz||The (reported) processor speed in gigahertz.|
|Price||The laptop price at the time of assessment in £’s.|
|Processor||A benchmark test involving the processor, recorded in seconds.|
|Memory||A benchmark test involving the (random access) memory, recorded in seconds.|
|Disk||A benchmark test involving the hard disk, recorded in seconds.|
|Video||A benchmark test determining how long it takes to produce and run through images, recorded in seconds|
|Battery||A benchmark test involving the battery life, recorded in minutes|
The benchmark tests alluded to above consist of a separate test on each component. The same tests are carried out on each laptop. For components Processor, Memory, Disk and Video these represent the length of time it takes to perform certain tasks specific to the component. Here, the less time the laptop takes, the better the laptop at that kind of task. For the battery component it represents the life of the battery, from fully charged, while undergoing a certain task.
The magazine has asked you to analyse the data and to develop a regression model suitable for generating a pre-review metric. They require a brief report detailing your analyses and have asked you to add address the points:
- The magazine feels that a regression model that takes into account all available variables would be best. For the report, produce the model requested by the magazine and, also develop your own model that you feel best explains the review score from the available data. Justify your choice of model using both common sense and statistical arguments. Do not attempt to validate any models at this stage in the analysis.
- Check for model adequacy for your preferred model in part a). (i.e., residual analysis, outliers/influential data, etc.) Explore potential improvements accordingly. If you improve your model justify any actions you take.
- The magazine want to know if they can use your model on a particular type of laptop called a netbook. Netbooks are ultra-portable laptops. They typically have much worse performance than standard laptops and often lack attributes of standard laptops, such as a DVD re-writer, large screen, etc. However, netbooks are incredibly small and light, and are ideal for the mobile user who wants to stay in contact. Data are provided on a single netbook review and is given below. For your report, produce an estimate for the review score using your preferred model along with 95% prediction intervals for the netbook. Do you recommend that the magazine use this model for scoring netbooks?