User:Sean3000/sandbox: Difference between revisions
Appearance
Content deleted Content added
No edit summary |
No edit summary |
||
(12 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
<!-- EDIT BELOW THIS LINE --> |
<!-- EDIT BELOW THIS LINE --> |
||
The Information Value is a method of [[Feature selection]] widely used in credit scoring<ref>Metric Divergence Measures and Information Value in Credit Scoring, |
The Information Value is a method of [[Feature selection]] widely used in credit scoring<ref>[[http://www.hindawi.com/journals/jmath/2013/848271/]]Metric Divergence Measures and Information Value in Credit Scoring, |
||
Guoping Zeng</ref>. |
Guoping Zeng</ref>. The formula is: |
||
:<math>\sum_{i=1}^{n}\left (Distr Good_{i}-Distr Bads_{i} \right ) \times ln\left (\frac{Distr Good_{i}}{Distr Bads_{i}} \right )</math> |
|||
Where ‘Distr Bad’ is the number of defaulting customers in a category over the total number of defaulting customers in a portfolio, while ‘Distr Good’ is the number of non-defaulting customers in a category over the total number of non-defaulting costumers. |
|||
For example, if the category is age band, it may be calculated as follows: |
|||
{| class="wikitable" |
|||
|- |
|||
! Age !! Total Count of Customers !! Count of Bads!! Count of Goods !! Distribution of Bads !! Distribution of Goods !! Information Value |
|||
|- |
|||
||<18||2,000||140||1,860||3.65%||3.86%||0.01% |
|||
|- |
|||
||19<25||5,000||960||4,040||25.00%||8.39%||18.14% |
|||
|- |
|||
||26<35||10,000||1,080||8,920||28.12%||18.52%||4.01% |
|||
|- |
|||
||36<50||12,000||900||11,100||23.44%||23.05%||0.01% |
|||
|- |
|||
||51<65||13,000||500||12,500||13.02%||25.96%||8.92% |
|||
|- |
|||
||66<70||7,000||200||6,800||5.21%||14.12%||8.89% |
|||
|- |
|||
||71+||3,000||060||2,940||1.56%||6.10%||6.19% |
|||
|- |
|||
|style="border-top: 1px solid red;"|Total:||52,000||3,840||48,160||100.00%||100.00%|| '''46.17%''' |
|||
|- |
|||
|} |
|||
Thus for the first row ("<18"), the Count of Bads is the Total Count of Customers - Total Count of Bads, (2000-140), the distribution of Goods is |
|||
==References== |
==References== |
||
{{Reflist}} |
{{Reflist}} |
||
http://www.hindawi.com/journals/jmath/2013/848271/ |
Latest revision as of 22:28, 2 December 2013
The Information Value is a method of Feature selection widely used in credit scoring[1]. The formula is:
Where ‘Distr Bad’ is the number of defaulting customers in a category over the total number of defaulting customers in a portfolio, while ‘Distr Good’ is the number of non-defaulting customers in a category over the total number of non-defaulting costumers.
For example, if the category is age band, it may be calculated as follows:
Age | Total Count of Customers | Count of Bads | Count of Goods | Distribution of Bads | Distribution of Goods | Information Value |
---|---|---|---|---|---|---|
<18 | 2,000 | 140 | 1,860 | 3.65% | 3.86% | 0.01% |
19<25 | 5,000 | 960 | 4,040 | 25.00% | 8.39% | 18.14% |
26<35 | 10,000 | 1,080 | 8,920 | 28.12% | 18.52% | 4.01% |
36<50 | 12,000 | 900 | 11,100 | 23.44% | 23.05% | 0.01% |
51<65 | 13,000 | 500 | 12,500 | 13.02% | 25.96% | 8.92% |
66<70 | 7,000 | 200 | 6,800 | 5.21% | 14.12% | 8.89% |
71+ | 3,000 | 060 | 2,940 | 1.56% | 6.10% | 6.19% |
Total: | 52,000 | 3,840 | 48,160 | 100.00% | 100.00% | 46.17% |
Thus for the first row ("<18"), the Count of Bads is the Total Count of Customers - Total Count of Bads, (2000-140), the distribution of Goods is