MAT012 Credit Risk Scoring

Assignment

MAT012信用风险评分

A部分包含三个基于短文的问题，占最终分数的50％。

B部分包含五个任务，这些任务使用给定的数据集建立记分卡，并计算

1.认真检查在制定信用风险评分时需要考虑的内容

[20分]

2.讨论如何将生存分析和多状态（马尔可夫链）模型用于

[15分]

3.详细介绍《巴塞尔协议》的历史，并讨论在建模《巴塞尔协议》时遇到的挑战

[15分]

B部分

1.将数据集分成两个子集，如下所示：

[5分]

2.对于每个子集，建立训练集和验证集。解释：

b。为什么同时需要培训和验证集；
C。拆分过程中遇到的任何问题。
[5分]

3.对于每个训练集，选择四个适合构建记分卡的变量。为了

（ii）至少一个类别变量具有两个以上类别，因此您可以查看是否

[10分]

4.使用从上面的练习中的粗分类中获得的二进制变量来

（i）对Checking = 1或2使用线性回归；
（ii）将Logistic回归用于Checking = 1或2；
（iii）将线性回归用于Checking = 3或4；
（iv）将Logistic回归用于Checking = 3或4；

[15分]

5.使用适用于每个计分卡的验证集推导所有计分卡的ROC曲线，如下所示

[15分]

MAT012 Credit Risk Scoring

Assignment 2020/21

This forms your assessment (100%) of this module.
There are two parts to this assessment.

Part A contains THREE short essay-based questions and counts for 50% of the final mark.
Each essay should be around 1,000-1,500 words in length.

Part B contains FIVE tasks to establish a scorecard using the given dataset and counts for
50% of the final mark. You may use Excel, SAS, R or Python to assist in the scorecard
preparation.

You must answer ALL questions.

PART A

1. Critically examine what needs to be considered when developing a credit risk scoring
model.
[20 marks]

2. Discuss how survival analysis and multi-state (Markov chain) models may be used in
credit risk modelling and the challenges they present.
[15 marks]

3. Detail the history of the Basel Accords and discuss the challenges in modelling the
credit risk on a portfolio of consumer loans.
[15 marks]

PART B

1. Split the dataset into two subsets as follows:

Subset 1: the applicants with Checking = 1 or Checking = 2
Subset 2: the applicants where Checking = 3 or Checking = 4

Clean the subsets if necessary.
[5 marks]

2. For each subset, establish a training set and validation set. Explain:
a. what principle you have used to decide on these;
b. why both training and validation sets are needed;
c. any issues encountered during the splitting exercise.
[5 marks]

3. For each training set choose four variables which are suitable for building a scorecard. For
each training set the variables must have (i) at least one continuous variable before binning;
(ii) at least one categorical variable with more than two categories, so you can see whether
categories can be combined.

Explain the rationale behind your choice of variables (using supporting statistics eg chi-
square). Should you be unable to choose variables satisfying the above criteria, explain the
problem you have encountered and the solution you have chosen to compromise the
variable selection.
[10 marks]

4. Using the binary variables obtained from the coarse classification in the above exercise to
build two scorecards for each training set, one using linear regression, the other using
logistic regression. Note this means you should have four scorecards in total:
(i) using linear regression for Checking = 1 or 2;
(ii) using logistic regression for Checking = 1 or 2;
(iii) using linear regression for Checking = 3 or 4;
(iv) using logistic regression for Checking = 3 or 4;

Note that the file you submit should include, in the Appendix, a table that gives the binary
variables you used, together with the coefficientsfor those variables calculated in each
regression.
[15 marks]

5. Derive ROC curves for all scorecards using the validation set applicable to each, showing in
detail how sensitivity and specificity have been calculated. Estimate the Gini coefficient and
KS values for each. Explain and comment on your results.
[15 marks]

EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!

E-mail: easydue@outlook.com  微信:easydue

EasyDue™是一个服务全球中国留学生的专业代写公司