一.背景介绍
我们都知道信贷流程的一种分法是贷前、贷中、贷后,如果应用到现金贷上就可以理解为授信、用信、还款,显然授信是整个过程的重中之重。在此我们对lending club的数据进行分析,考虑到lending club在美国的业务场景主要还是以偿还信用债务为主,这也是得益于有80%的美国人都使用信用卡,所以在授信环节有很多可以学习借鉴之处,而我国的现金贷业务主要还是服务于一些银行征信系统未服务到的客户,这部分客户也是我国现金贷的主要发展人群,从哪些维度能够挖掘出优质客户,是我们接下来要探索的问题,这对于我们现金贷公司搭建业务系统有很大的参照作用。
二.提出问题
什么是优质客户?优质客户有什么特征?如果确定为优质客户,我们从还款意愿和还款能力来分析。
在我国互联网金融现金贷的业务指标体系:
由于lending club数据维度有限,只能从用户指标来分析哪些用户的还款能力强和还款意愿强,从而得出优质客户的特征。
通过上面业务指标,我们可以提出如下几个问题:
Q1:是否收入越高还款能力越强?
Q2:是否身处经济越发达城市还款能力越强?
Q3:工作时间越长还款能力越强?
Q4:近6个月查询信用次数越低还款意愿越强?
Q5:过去2年逾期30日以上次数越低还款意愿越强?
三.数据及数据清洗
1.数据来源
2.理解数据
Lending Club数据集共有13万行,字段145个。需要用到的字段解释如下:
数据非常合理,无需清洗,直接导入数据:
四 .数据分析
Q1:是否收入越高还款能力越强?
SELECT CASE WHEN annual_inc <= 30000 THEN '小于3万'
WHEN annual_inc > 30000 AND annual_inc <= 100000 THEN '3万—10万'
WHEN annual_inc > 100000 AND annual_inc <= 200000 THEN '10万—20万'
WHEN annual_inc > 200000 AND annual_inc <= 300000 THEN '20万—30万'
ELSE '大于30万' END AS '年收入',
COUNT(*) AS '贷款人数',
SUM( CASE WHEN loan_status = 'Fully paid' THEN 1
WHEN loan_status = 'Current' THEN 1
WHEN loan_status = 'In Grace Period' THEN 1
WHEN loan_status = 'Issued' THEN 1
ELSE 0 END ) AS '正常还款人数',
CONCAT(ROUND(SUM(CASE WHEN loan_status = 'Fully paid' THEN 1
WHEN loan_status = 'Current' THEN 1
WHEN loan_status = 'In Grace Period' THEN 1
WHEN loan_status = 'Issued' THEN 1
ELSE 0 END)/COUNT(*)*100),'%') AS '正常占比',
SUM( CASE WHEN loan_status = 'Late (16-30 days)' THEN 1
WHEN loan_status = 'Late (31-120 days)' THEN 1
WHEN loan_status = 'Charged Off' THEN 1
WHEN loan_status = 'Default' THEN 1
ELSE 0 END ) AS '逾期还款人数',
CONCAT(ROUND(SUM( CASE WHEN loan_status = 'Late (16-30 days)' THEN 1
WHEN loan_status = 'Late (31-120 days)' THEN 1
WHEN loan_status = 'Charged Off' THEN 1
WHEN loan_status = 'Default' THEN 1
ELSE 0 END )/COUNT(*)*100),'%') AS '逾期占比'
FROM `lending club`
GROUP BY CASE WHEN annual_inc <= 30000 THEN '小于3万'
WHEN annual_inc > 30000 AND annual_inc <= 100000 THEN '3万—10万'
WHEN annual_inc > 100000 AND annual_inc <= 200000 THEN '10万—20万'
WHEN annual_inc > 200000 AND annual_inc <= 300000 THEN '20万—30万'
ELSE '大于30万' END
结论:通过视图可知,年收入越高的客群,整体逾期率是相对低的。可以看出,收入的众数以3万-10万为主,这也与美国普通国民年均收入6万左右相符.
Q2:是否身处经济越发达城市还款能力越强?
SELECT addr_state AS '客户城市',
COUNT(*) AS '贷款人数',
SUM( CASE WHEN loan_status = 'Fully paid' THEN 1
WHEN loan_status = 'Current' THEN 1
WHEN loan_status = 'In Grace Period' THEN 1
WHEN loan_status = 'Issued' THEN 1
ELSE 0 END ) AS '正常还款人数',
CONCAT(ROUND(SUM(CASE WHEN loan_status = 'Fully paid' THEN 1
WHEN loan_status = 'Current' THEN 1
WHEN loan_status = 'In Grace Period' THEN 1
WHEN loan_status = 'Issued' THEN 1
ELSE 0 END)/COUNT(*)*100),'%') AS '正常占比',
SUM( CASE WHEN loan_status = 'Late (16-30 days)' THEN 1
WHEN loan_status = 'Late (31-120 days)' THEN 1
WHEN loan_status = 'Charged Off' THEN 1
WHEN loan_status = 'Default' THEN 1
ELSE 0 END ) AS '逾期还款人数',
CONCAT(ROUND(SUM( CASE WHEN loan_status = 'Late (16-30 days)' THEN 1
WHEN loan_status = 'Late (31-120 days)' THEN 1
WHEN loan_status = 'Charged Off' THEN 1
WHEN loan_status = 'Default' THEN 1
ELSE 0 END )/COUNT(*)*100),'%') AS '逾期占比'
FROM `lending club`
GROUP BY addr_state
ORDER BY COUNT(*) DESC
结论:抽取贷款人数排名前十四的各州,可以得出美国东西岸沿海经济较发达的州客户贷款人数较多,虽然无法得出经济发达的地区客户还款能力越强,但是可以从视图得出,由于经济较发达地区,客户的金融意识强,消费水平较高,所以有比较强烈的贷款意愿,而且从逾期占比的结果来看,经济发达的地区,逾期率也是处于一个相对比较平均的水平,这个结论可以为公司的运营提供帮助。
Q3:工作时间越长还款能力越强?
SELECT emp_length AS '工作年限',
COUNT(*) AS '贷款人数',
SUM( CASE WHEN loan_status = 'Fully paid' THEN 1
WHEN loan_status = 'Current' THEN 1
WHEN loan_status = 'In Grace Period' THEN 1
WHEN loan_status = 'Issued' THEN 1
ELSE 0 END ) AS '正常还款人数',
CONCAT(ROUND(SUM(CASE WHEN loan_status = 'Fully paid' THEN 1
WHEN loan_status = 'Current' THEN 1
WHEN loan_status = 'In Grace Period' THEN 1
WHEN loan_status = 'Issued' THEN 1
ELSE 0 END)/COUNT(*)*100),'%') AS '正常占比',
SUM( CASE WHEN loan_status = 'Late (16-30 days)' THEN 1
WHEN loan_status = 'Late (31-120 days)' THEN 1
WHEN loan_status = 'Charged Off' THEN 1
WHEN loan_status = 'Default' THEN 1
ELSE 0 END ) AS '逾期还款人数',
CONCAT(ROUND(SUM( CASE WHEN loan_status = 'Late (16-30 days)' THEN 1
WHEN loan_status = 'Late (31-120 days)' THEN 1
WHEN loan_status = 'Charged Off' THEN 1
WHEN loan_status = 'Default' THEN 1
ELSE 0 END )/COUNT(*)*100),'%') AS '逾期占比'
FROM `lending club`
GROUP BY emp_length
结论:从视图可以看出LC公司更愿意把钱借给工作时间超过10年的用户,同时这类用户的逾期率也是最低的,但是总体来讲整个逾期的占比分布也是在6%左右,可以推测LC整个的贷前申请贷中跟踪贷后的催收系统是有效的。
Q4:近6个月查询信用次数越低还款意愿越强?
SELECT inq_last_6mths AS '近6个月查询次数',
COUNT(*) AS '贷款人数',
SUM( CASE WHEN loan_status = 'Fully paid' THEN 1
WHEN loan_status = 'Current' THEN 1
WHEN loan_status = 'In Grace Period' THEN 1
WHEN loan_status = 'Issued' THEN 1
ELSE 0 END ) AS '正常还款人数',
CONCAT(ROUND(SUM(CASE WHEN loan_status = 'Fully paid' THEN 1
WHEN loan_status = 'Current' THEN 1
WHEN loan_status = 'In Grace Period' THEN 1
WHEN loan_status = 'Issued' THEN 1
ELSE 0 END)/COUNT(*)*100),'%') AS '正常占比',
SUM( CASE WHEN loan_status = 'Late (16-30 days)' THEN 1
WHEN loan_status = 'Late (31-120 days)' THEN 1
WHEN loan_status = 'Charged Off' THEN 1
WHEN loan_status = 'Default' THEN 1
ELSE 0 END ) AS '逾期还款人数',
CONCAT(ROUND(SUM( CASE WHEN loan_status = 'Late (16-30 days)' THEN 1
WHEN loan_status = 'Late (31-120 days)' THEN 1
WHEN loan_status = 'Charged Off' THEN 1
WHEN loan_status = 'Default' THEN 1
ELSE 0 END )/COUNT(*)*100),'%') AS '逾期占比'
FROM `lending club`
GROUP BY inq_last_6mths
结论:查询信用次数越少的客户还款的意愿越强,查询信用超过3次的客户逾期占比达到了13%,由这个指标可以推测查询信用越多的用户可能是手头比较紧,或者急需用钱。
Q5:过去2年逾期30日以上次数越低还款意愿越强?
SELECT delinq_2yrs AS '过去2年逾期30天以上次数',
COUNT(*) AS '贷款人数',
SUM( CASE WHEN loan_status = 'Fully paid' THEN 1
WHEN loan_status = 'Current' THEN 1
WHEN loan_status = 'In Grace Period' THEN 1
WHEN loan_status = 'Issued' THEN 1
ELSE 0 END ) AS '正常还款人数',
CONCAT(ROUND(SUM(CASE WHEN loan_status = 'Fully paid' THEN 1
WHEN loan_status = 'Current' THEN 1
WHEN loan_status = 'In Grace Period' THEN 1
WHEN loan_status = 'Issued' THEN 1
ELSE 0 END)/COUNT(*)*100),'%') AS '正常占比',
SUM( CASE WHEN loan_status = 'Late (16-30 days)' THEN 1
WHEN loan_status = 'Late (31-120 days)' THEN 1
WHEN loan_status = 'Charged Off' THEN 1
WHEN loan_status = 'Default' THEN 1
ELSE 0 END ) AS '逾期还款人数',
CONCAT(ROUND(SUM( CASE WHEN loan_status = 'Late (16-30 days)' THEN 1
WHEN loan_status = 'Late (31-120 days)' THEN 1
WHEN loan_status = 'Charged Off' THEN 1
WHEN loan_status = 'Default' THEN 1
ELSE 0 END )/COUNT(*)*100),'%') AS '逾期占比'
FROM `lending club`
GROUP BY delinq_2yrs
结论:逾期次数越少,还款意愿越强。
五.分析总结
通过以上的用户指标维度的分析,可以知道我要寻找的优质客户的特征是,这个客户是中高收入人群,并且有超过10年的工作经验,可以推测出年龄至少30周岁以上,并且身处经济比较发达的城市,还款习惯良好,很少查询征信报告的这样一些特征的用户。