ISCG6425 Data Warehousing

Assignment 2

Semester 2, 2019

School of Computing and Information Technology Due Date: Monday 14 October 2019

Deadline Time:  5:30pm

 

 

  Total Marks:  100

Course Weighting:  30%

 

 

This is an individual assignment.

Purpose of Assignment 

  • Understand how to analyse OLAP requirements used for decision making
  • Develop SQL script to perform Data Quality checking and logging
  • Implement a complete data warehouse solution

 

Instructions 

In year 1996, Northwind Company deployed MS-SQL database management systems (OLTP database) to store their transactional data related to their customers, products, and sales transactions. They implemented two databases named northwind3 and northwind4 to serve customers who live in America and Europe, respectively. Later, the Sales department manager wants to integrate sales data stored in the two databases into a single database to help them have a central access to all data and perform data analysis. Therefore, they have decided to implement a data warehouse for this purpose. Fortunately, the same schema is designed for both northwind3 and northwind4, as shown in Figure 1.

Later in year 1998, the Sales department manager wants to integrate sales data stored in the two databases. Therefore, they have decided to implement a data warehouse (OLAP database) system for this purpose. The Data Warehouse schema is shown in Figure 2.

Figure 1.  Northwinds 3 and 4 OLTP Schema

 

Figure 2. OLAP Database Schema

Tasks 

Assume that you employed as a database analyst for the company and are assigned to carry out the following tasks:

  1. Create the OLAP database based on the schema given in Figure 2.
  2. Generate Data Quality Rules to ensure that all of the data imported to the Data Warehouse are clean, i.e. they are all accurate, valid, consistent, complete and uniform. It must also include the Data Quality (DQ) Rules mentioned in Table 1. You are also required to add two additional DQ rules of your choice.
  3. Check each of the data from northwind3 and northwind4 that needs to be imported to the OLAP database against the Data Quality Rules. Log all records that violate the Data Quality Rules that you have created on Task 2 to a DQLog table and summarize it in a DQ Summary report.
  4. Import the clean data from northwind3 and northwind4 to the OLAP database.
  5. Check that the data that you have imported are all correct.

 

Table 1. Data Quality Rules 

Rule No. Description Action (If-Then)
1 UnitPrice  checking in Products Reject if Unit Price is negative or 0
2 Quantity checking in Order Details Reject if Quantity is negative or null
3 Discount checking for Order Details Allow for Discount > 10% (0.1) on a product that has UnitPrice greater than $300
4 Country checking in Customers and Suppliers Fix the Country format regardless capital or non-capital cases, e.g.

US, United States, UNITED STATES à USA (in northwind3)

United Kingdom, UNITED KINGDOM, Britain, BRITAIN à UK (in northwind4)

5 CategoryID and SupplierID checking in Products Reject for the product that belongs to a non-existing

category/supplier (i.e., invalid CategoryID/SupplierID or null CategoryID/SupplierID)

6 ProductID checking in Order Details Reject if ProductID doesn’t exist or is null
7 CustomerID, ShipAddress and ShipCity checking in Orders Reject if (CustomerID doesn’t exist or is null) and (both ShipAddress and ShipCity are null)
8 ShipVia checking in Orders Allow for a non-existing Shipper or a null ShipVia
9 Freight checking in Orders Allow for a Freight valued more than the 12% of total cost of the entire order
10 OrderDate checking in Orders Reject if OrderDate is null

 

You are required to write a report that

  1. Describes the ETL and Data Quality Assurance process that you have done to ensure that your proposed design has met all of the requirements above.
  2. Includes the Data Quality Rules that you have designed in task 2 to ensure that all of the data are all accurate, valid, consistent, complete and uniform. For each of the rules, please briefly explain why you have designed that rule and which criterion is it testing.
  3. Contains the DQ Summary Report table (from task 3) and present your evaluation and critique based on the results.

 

Presentation

You will be required to present your solution design and rationale, key technique/technology used, and conduct a live demo of your solution to the class. Durations are limited to 10 minutes for the presentation and the demo. You will be required to answer 5-10 random questions regarding the work. This is to verify that you have understood the assignment and undertaken the work by yourself.

Assignment Submission 

  • Your submission must include
    1. A report covering all of the information mentioned above.
    2. An SQL file containing SQL codes for ETL, Data Quality Assurance and Evaluation process.
    3. A PDF or PowerPoint file of your presentation.
  • Submit all files on Moodle.
  • You must acknowledge and declare on the coversheet of assignment (download from Moodle) that your submitted work has been completed by yourself and it shows no attempt of plagiarisms. If any case of plagiarisms is suspected, you are required to declare the ownership and authenticity of your work to the marker. Maximum mark deduction may apply to your work.
  • **Similarity Check** If you copy contents (texts, images, tables, etc.) from other sources, the Turnitin will detect that. In general, over 30% similarity is NOT
    1. If your work is between 30%-50% similar, your mark will be deducted proportionally (for example, 2% marks deduction per 1% similarity).
    2. If your work is found over 50% similar, then you need to resubmit the assignment as well as still get some penalties.

 

Late Submission of Assignments:

Assignments submitted after the due date and time without having received an extension through Special Assessment Circumstances (SAC) will be penalised according to the following:

  • 10% of marks deducted if submitted within 24hrs of the deadline,
  • 20% of marks deducted if submitted after 24hrs and up to 48hrs of the deadline,
  • 30% of marks deducted if submitted after 48hrs and up to 72hrs of the deadline,
  • No grade will be given for an assignment that is submitted later than 72hrs after the deadline.

 

Special Assessment Circumstances:

A student, who due to circumstances beyond his or her control, misses a test, final exam or an assignment deadline or considers his or her performance in a test, final exam or an assignment to have been adversely affected, should complete the Special Assessment Circumstances (SAC) form available from the Student Central.

When requesting an SAC for an assignment, the SAC must be submitted (along with work completed to date) within the time frame of the extension requested; i.e. if the Doctor’s certificate is for one (1) day, then the SAC and work completed must be submitted within one (1) day.

Assistance to other Students:

Students themselves can be an excellent resource to assist the learning of fellow students, but there are issues that arise in assessments that relate to the type and amount of assistance given by students to other students. It is important to recognise what types of assistance are beneficial to another’s learning and also what types of assistance are unacceptable in an assessment.

Beneficial Assistance:

  • Study Groups
  • Discussion
  • Sharing Reading Material
  • Reading the available online and library resources

Unacceptable Assistance:

  • Working together on one copy of the assessment and submitting it as own work
  • Giving another student your work
  • Copying someone else’s work, this includes work done by someone not on the course
  • Changing or correcting another student’s work
  • Copying from books, the Internet etc. and submitting it as own work; anything taken directly from another source must be acknowledged correctly; show the source alongside the quotation

Marking Schedule

Task Marking Criteria Marks
SQL 2 for writing the correct SQL queries to create fact table & attributes based on the chosen OLAP design

4 for writing the correct SQL queries to create dimension tables & attributes based on the chosen OLAP design

2 for writing the correct SQL queries for adding constraints on all the attributes in the tables

25 for writing the correct SQL queries for checking each of the compulsory DQ rules (2 marks for each rule)

6 writing the correct SQL queries for checking each of your proposed additional DQ rules (3 marks for each rule)

3 for writing the correct SQL queries to populate DQ Log Table

4 for writing the correct SQL queries to populate DQ Summary Table

8 for writing the correct SQL queries to populate fact table

10 for writing the correct SQL queries to populate dimension tables

2 for writing the correct SQL queries to validate data in the fact table

2 for writing the correct SQL queries to validate data in the dimension tables

 

 

 

 

68
Report 5 Introduction and title page added and good report structure to report

10 for correctly explaining the reasons for each rule being added to DQ rules and step by step process required to move data from OLTP to OLAP schema

2 for describing the step by step process involved in Data Quality Assurance

2 for including the correct DQ Summary Report

3 for briefly explaining the advantages of the proposed solution

3 for briefly explaining the disadvantages/limitations of the proposed solution

2 for listing all references and using the correct APA referencing

 

 

 

 

27
Presentation & demo 5 marks for presentation and demo

 

 

 

 

5
Total Marks 100