- Plagiarism: Please follow the guidelines laid down by our department.
- You are allowed to discuss the assignment with your classmates, however,direct copy and paste is PROHIBITED and would be considered as PLAGIARISM.
- Assignments would be marked based on the logic, presentation and understanding of the problem; not only on accuracy.
- (25%) Exploratory Data Analysis
a.Download the following dataset Synthetic Financial Datasets For Fraud Detection from the link below:
b.Using the R package, conduct exploratory analysis of the dataset downloaded
- (25%) Cluster Analysis
a.Download the following dataset Credit Card Fraud Detection Data from the link below:
b.Using the R package, conduct cluster analysis of the dataset downloaded
NOTE: A sample R script is provided, but you still need to complete the program. Or you can build the model by yourselves and use whatever library you like.
3. (50%) Write a 2-3 pages essay on the following:
a.Describe the dataset based on the exploratory analysis result, including:
- Summary description of the dataset
- Univariate analysis
- Bi-/Multi-variate analysis
- Missing data/Outlier analysis
b.Describe the following based on the k-means clustering results,including:
- Explain how you find the optimal number “k”, i.e. number of clusters used to build the k-means model
- Name the clusters found and interpret what each cluster represents
- S ubmission on Moodle:
a.R language script
b.A pdf version report
EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!
E-mail: firstname.lastname@example.org 微信:easydue