本次英国代写主要为宏观信用风险评估代写
MAT012 Credit Risk Scoring
Assignment
MAT012信用风险评分
这构成了您对该模块的评估(100%)。
此评估分为两个部分。
A部分包含三个基于短文的问题,占最终分数的50%。
每篇论文的长度应在1,000-1,500字左右。
B部分包含五个任务,这些任务使用给定的数据集建立记分卡,并计算
最终成绩的50%。您可以使用Excel,SAS,R或Python来帮助计分卡
准备。
您必须回答所有问题。
甲部
1.认真检查在制定信用风险评分时需要考虑的内容
模型。
[20分]
2.讨论如何将生存分析和多状态(马尔可夫链)模型用于
信用风险建模及其带来的挑战。
[15分]
3.详细介绍《巴塞尔协议》的历史,并讨论在建模《巴塞尔协议》时遇到的挑战
消费贷款组合的信用风险。
[15分]
B部分
1.将数据集分成两个子集,如下所示:
子集1:Checking = 1或Checking = 2的申请人
子集2:支票= 3或支票= 4的申请人
如有必要,清洁子集。
[5分]
2.对于每个子集,建立训练集和验证集。解释:
一种。您用来决定这些原则的原则是什么;
b。为什么同时需要培训和验证集;
C。拆分过程中遇到的任何问题。
[5分]
3.对于每个训练集,选择四个适合构建记分卡的变量。为了
每个训练集变量必须具有(i)装仓前至少一个连续变量;
(ii)至少一个类别变量具有两个以上类别,因此您可以查看是否
类别可以组合。
解释您选择变量的背后原理(使用辅助统计数据,例如chi-
正方形)。如果您无法选择满足上述条件的变量,请说明
您遇到的问题以及您选择的折中方案
变量选择。
[10分]
4.使用从上面的练习中的粗分类中获得的二进制变量来
为每个训练集建立两个记分卡,一个使用线性回归,另一个使用
逻辑回归。请注意,这意味着您应该总共拥有四个记分卡:
(i)对Checking = 1或2使用线性回归;
(ii)将Logistic回归用于Checking = 1或2;
(iii)将线性回归用于Checking = 3或4;
(iv)将Logistic回归用于Checking = 3或4;
请注意,您提交的文件应在附录中包含一个表,该表提供了二进制文件。
您使用的变量,以及在每个变量中计算出的那些变量的系数
回归。
[15分]
5.使用适用于每个计分卡的验证集推导所有计分卡的ROC曲线,如下所示
详细说明如何计算敏感性和特异性。估计基尼系数,然后
每个的KS值。解释并评论您的结果。
[15分]
MAT012 Credit Risk Scoring
Assignment 2020/21
This forms your assessment (100%) of this module.
There are two parts to this assessment.
Part A contains THREE short essay-based questions and counts for 50% of the final mark.
Each essay should be around 1,000-1,500 words in length.
Part B contains FIVE tasks to establish a scorecard using the given dataset and counts for
50% of the final mark. You may use Excel, SAS, R or Python to assist in the scorecard
preparation.
You must answer ALL questions.
PART A
1. Critically examine what needs to be considered when developing a credit risk scoring
model.
[20 marks]
2. Discuss how survival analysis and multi-state (Markov chain) models may be used in
credit risk modelling and the challenges they present.
[15 marks]
3. Detail the history of the Basel Accords and discuss the challenges in modelling the
credit risk on a portfolio of consumer loans.
[15 marks]
PART B
1. Split the dataset into two subsets as follows:
Subset 1: the applicants with Checking = 1 or Checking = 2
Subset 2: the applicants where Checking = 3 or Checking = 4
Clean the subsets if necessary.
[5 marks]
2. For each subset, establish a training set and validation set. Explain:
a. what principle you have used to decide on these;
b. why both training and validation sets are needed;
c. any issues encountered during the splitting exercise.
[5 marks]
3. For each training set choose four variables which are suitable for building a scorecard. For
each training set the variables must have (i) at least one continuous variable before binning;
(ii) at least one categorical variable with more than two categories, so you can see whether
categories can be combined.
Explain the rationale behind your choice of variables (using supporting statistics eg chi-
square). Should you be unable to choose variables satisfying the above criteria, explain the
problem you have encountered and the solution you have chosen to compromise the
variable selection.
[10 marks]
4. Using the binary variables obtained from the coarse classification in the above exercise to
build two scorecards for each training set, one using linear regression, the other using
logistic regression. Note this means you should have four scorecards in total:
(i) using linear regression for Checking = 1 or 2;
(ii) using logistic regression for Checking = 1 or 2;
(iii) using linear regression for Checking = 3 or 4;
(iv) using logistic regression for Checking = 3 or 4;
Note that the file you submit should include, in the Appendix, a table that gives the binary
variables you used, together with the coefficientsfor those variables calculated in each
regression.
[15 marks]
5. Derive ROC curves for all scorecards using the validation set applicable to each, showing in
detail how sensitivity and specificity have been calculated. Estimate the Gini coefficient and
KS values for each. Explain and comment on your results.
[15 marks]