Overview of Dataset
My dataset is a credit card approval prediction dataset. It includes two csv file: the first one is application_record csv and credit_record csv. The application_record csv contains appliers’ personal information, which we could use as features for predicting. The credit_record.csv records users’ behaviors of credit card.
We found this dataset on Kaggle and we already downloaded these two csv files from Kaggle. The link for the dataset
In the application_record csv file, there are 438557 observations of 18 variables. And in the credit_record csv file, there are 1048575 observation of three variables. No missing values in credit_record, but some occupation type values (about 1/3) are missing in application_record, which leaves us about 300000 complete data entries. We think that is enough for us to predict the credit card approval pattern. Or we can simply remove the Occupation parameter in the model. We will need look into that later.
Overview of Research Questions
Our prject focuses on predicting credit card approval. The main research question is what the credit card issuance criteria are. Among all 18 variables in application_record, we find that annual income, number of children, education level, age, days employed, number of family members are predictors that could potentially have larger effect on the credit card approval decision.
The question will be best answered with both regression and classification approach. Since these 18 variables contain both numbers (quantitative) and characters (qualitative).
One thing to note — the use of regression or classification methods depends on the form of the
outcome variable. Since it sounds like your outcome variable is categorical (approval or not), you’ll
most likely end up using classification machine learning models.
project report should be written similarly to a paper, with figures, code, and results included
throughout to illustrate your points and findings. Text should be included to guide the reader. I
recommend reading through the example report to get an idea of this layout. More specifically, your
report should contain:
– An introduction section: Describes the data, the research questions, provides any background
readers need to understand your project, etc.
– A conclusion section: Discusses the outcome(s) of models you fit. Which models performed well,
which performed poorly? Were you surprised by model performance? Next steps? General
– A table of contents
– A section for exploratory data analysis: This should contain at least 3 to 5 visualizations and/or tables
and their interpretation/discussion. At minimum your group should create a univariate visualization of
the outcome(s), a bi-variate or multivariate visualization of the relationship(s) between the outcome
and select predictors, etc. Part of an EDA involves asking questions about your data and exploring
your data to find the answers.
– A section discussing data splitting and cross-validation: Describe your process of splitting data into
training, test, and/or validation sets. Describe the process of cross-validation.
– A section discussing model fitting: Describe the types of models you fit, their parameter values, and
– Model selection and performance: A table and/or graph describing the performance of your best
fitting model on testing data. Describe your best-fitting model however you choose, and the quality of
its predictions, etc.
EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!
E-mail: firstname.lastname@example.org 微信:easydue