Jump to content

User:Sean3000/sandbox: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Sean3000 (talk | contribs)
No edit summary
Sean3000 (talk | contribs)
No edit summary
 
(12 intermediate revisions by the same user not shown)
Line 2: Line 2:
<!-- EDIT BELOW THIS LINE -->
<!-- EDIT BELOW THIS LINE -->


The Information Value is a method of [[Feature selection]] widely used in credit scoring<ref>Metric Divergence Measures and Information Value in Credit Scoring,
The Information Value is a method of [[Feature selection]] widely used in credit scoring<ref>[[http://www.hindawi.com/journals/jmath/2013/848271/]]Metric Divergence Measures and Information Value in Credit Scoring,
Guoping Zeng</ref>.
Guoping Zeng</ref>. The formula is:

:<math>\sum_{i=1}^{n}\left (Distr Good_{i}-Distr Bads_{i} \right ) \times ln\left (\frac{Distr Good_{i}}{Distr Bads_{i}} \right )</math>

Where ‘Distr Bad’ is the number of defaulting customers in a category over the total number of defaulting customers in a portfolio, while ‘Distr Good’ is the number of non-defaulting customers in a category over the total number of non-defaulting costumers.

For example, if the category is age band, it may be calculated as follows:

{| class="wikitable"
|-
! Age !! Total Count of Customers !! Count of Bads!! Count of Goods !! Distribution of Bads !! Distribution of Goods !! Information Value
|-
||<18||2,000||140||1,860||3.65%||3.86%||0.01%
|-
||19<25||5,000||960||4,040||25.00%||8.39%||18.14%
|-
||26<35||10,000||1,080||8,920||28.12%||18.52%||4.01%
|-
||36<50||12,000||900||11,100||23.44%||23.05%||0.01%
|-
||51<65||13,000||500||12,500||13.02%||25.96%||8.92%
|-
||66<70||7,000||200||6,800||5.21%||14.12%||8.89%
|-
||71+||3,000||060||2,940||1.56%||6.10%||6.19%
|-
|style="border-top: 1px solid red;"|Total:||52,000||3,840||48,160||100.00%||100.00%|| '''46.17%'''
|-

|}
Thus for the first row ("<18"), the Count of Bads is the Total Count of Customers - Total Count of Bads, (2000-140), the distribution of Goods is
==References==
==References==
{{Reflist}}
{{Reflist}}
http://www.hindawi.com/journals/jmath/2013/848271/

Latest revision as of 22:28, 2 December 2013

The Information Value is a method of Feature selection widely used in credit scoring[1]. The formula is:

Where ‘Distr Bad’ is the number of defaulting customers in a category over the total number of defaulting customers in a portfolio, while ‘Distr Good’ is the number of non-defaulting customers in a category over the total number of non-defaulting costumers.

For example, if the category is age band, it may be calculated as follows:

Age Total Count of Customers Count of Bads Count of Goods Distribution of Bads Distribution of Goods Information Value
<18 2,000 140 1,860 3.65% 3.86% 0.01%
19<25 5,000 960 4,040 25.00% 8.39% 18.14%
26<35 10,000 1,080 8,920 28.12% 18.52% 4.01%
36<50 12,000 900 11,100 23.44% 23.05% 0.01%
51<65 13,000 500 12,500 13.02% 25.96% 8.92%
66<70 7,000 200 6,800 5.21% 14.12% 8.89%
71+ 3,000 060 2,940 1.56% 6.10% 6.19%
Total: 52,000 3,840 48,160 100.00% 100.00% 46.17%

Thus for the first row ("<18"), the Count of Bads is the Total Count of Customers - Total Count of Bads, (2000-140), the distribution of Goods is

References

[edit]
  1. ^ [[1]]Metric Divergence Measures and Information Value in Credit Scoring, Guoping Zeng