Cumulative accuracy profile: Difference between revisions
No edit summary |
|||
(46 intermediate revisions by 34 users not shown) | |||
Line 1: | Line 1: | ||
{{Multiple issues| |
|||
{{notability|date=October 2020}} |
{{notability|date=October 2020}} |
||
{{ |
{{more citations needed|date=November 2020}} |
||
}} |
|||
A '''cumulative accuracy profile''' (CAP) is a concept utilized in [[data science]] to visualize [[discrimination power]]. The CAP of a model represents the cumulative number of positive outcomes along the ''y''-axis versus the corresponding cumulative number of a classifying parameter along the ''x''-axis. The output is called a CAP curve.<ref>{{Cite web|title=CUMULATIVE ACCURACY PROFILE AND ITS APPLICATION IN CREDIT RISK|url=https://www.linkedin.com/pulse/cumulative-accuracy-profile-its-application-credit-frm-prm-cma-acma|access-date=2020-12-11|website=www.linkedin.com|language=en}}</ref> The CAP is distinct from the [[receiver operating characteristic]] (ROC) curve, which plots the [[true-positive rate]] against the [[false-positive rate]]. |
|||
CAPs are used in robustness evaluations of classification models. |
|||
==Example== |
|||
Assume you are a scientist at a store which sells clothes. Your store has a total of 100,000 customers, which we place on the horizontal axis. You know from experience that whenever you send an offer to your customers, approximately 10 percent of them respond and purchase the product, which means that 10% of the total (10,000) is placed on the vertical axis. Now imagine we've got an offer that we want to send, and we want to see how many customers are going to purchase our product. By using a random selection process, we can draw a line that will represent the random selection. The slope of the line equal to the 10 percent that we know responds on average to an offer as if we just send them out like that. Now the question is, can we somehow improve this experience? Can we get more customers to respond to offers? Can we somehow target our customers more appropriately to get a better response rate? How about instead of sending out these offers randomly, to say a random sample of 20000 customers, how could we pick and choose the customer we send these offers to? How do we pick and choose? To start with, we will build a model. A customer segmentation model demographic segmentation model but which wants to predict whether or not they will leave the company will predict whether or not they will purchase the product It's a very simple process it's the same thing because purchased is also a binary variable yes or no. And we can also run the same experiment and we can take a group of customers before we send out the offer and then look back and see who purchased whether male or female, Which country were they in what age predominately were they browsing on mobile were they browsing via a computer and all of these factors we can take them into account them put them into a logistic regression and get a model which will help us assess the likelihood of certain types of customers purchasing based of their characteristics or the general demographic status and other characteristics. |
|||
[[File:Cap curve.png|thumb|The CAP Curve for the perfect, good and random model predicting the buying customers from a pool of 100000 individuals.]] |
|||
Once we have built this model how about we apply it to select customer we will sent the offer to female customer of a bank whose favorite color is red they're most likely to leave the bag here will her we have a similar result will say perhaps male customer in this certain age group who browse and mobile are most likely to purchase a mobile or something else if will tell us something or ill actually rank our Customers we 'll give them probability of purchasing and we use the portability to contact your customer, of course, we contact we get zero response, then if we contact 20000 we'll probably get a much higher response rate than just 2000 because we're contacted 2000 were Our response rate will be higher than 4000 which we get in this random scenario if we if our model is real good by the time we're at around 60 thousand so more than just over half of our total customer base and we are really getting to that 10000 mark so we get 10000 people will respond in more than actually more than 9000 we could actually stop her. So now this draws a line through these crosses. so what you see this line here is called the cumulative accuracy profile of your model. |
|||
==Analyzing a CAP== |
==Analyzing a CAP== |
||
A cumulative accuracy profile can be used to evaluate a model by comparing the current curve to both the 'perfect' and a randomized curve. A good model will have a CAP between the perfect and random curves; the closer a model is to the perfect CAP, the better it is. |
|||
The accuracy ratio (AR) is defined as the ratio of the area between the model CAP and |
The accuracy ratio (AR) is defined as the ratio of the area between the model CAP and random CAP, and the area between the perfect CAP and random CAP.<ref>{{Citation |
||
| first = Raffaella |
| first = Raffaella |
||
| last = Calabrese |
| last = Calabrese |
||
Line 21: | Line 19: | ||
| year = 2009 |
| year = 2009 |
||
| place = Geneva, Switzerland |
| place = Geneva, Switzerland |
||
| url = http://www.statoo.ch/jss09/presentations/Calabrese.pdf }}</ref> |
| url = http://www.statoo.ch/jss09/presentations/Calabrese.pdf }}</ref> In a successful model, the AR has values between zero and one, and the higher the value is, the stronger the model. |
||
Another indication of a model's strength is given by the cumulative number of positive outcomes at 50% of the classifying parameter. For a successful model this value should lie between 50% and 100% of the maximum, with a higher percentage for stronger models. |
|||
The cumulative number of positive outcomes indicates a model's strength. For a successful model, this value should lie between 50% and 100% of the maximum, with a higher percentage for stronger models. In sporadic cases, the accuracy ratio can be negative. In this case, the model is performing worse than the random CAP. |
|||
==Applications== |
==Applications== |
||
The cumulative accuracy profile (CAP) and ROC curve are both commonly used by banks and regulators to analyze the discriminatory ability of rating systems that evaluate credit risks.<ref>{{Citation |
|||
| |
| last1 =Engelmann |
||
| |
| first1 =Bernd |
||
| last2 =Hayden |
| last2 =Hayden |
||
| first2 =Evelyn |
| first2 =Evelyn |
||
Line 39: | Line 35: | ||
| journal =Discussion Paper |
| journal =Discussion Paper |
||
| volume =Series 2: Banking and Financial Supervision |
| volume =Series 2: Banking and Financial Supervision |
||
| issue = |
| issue = 1 |
||
| date =2003}}</ref> |
| date =2003}}</ref><ref>{{Citation |
||
| last1 =Sobehart |
|||
<ref>{{Citation |
|||
| |
| first1 =Jorge |
||
| first =Jorge |
|||
| last2 =Keenan |
| last2 =Keenan |
||
| first2 =Sean |
| first2 =Sean |
||
Line 51: | Line 46: | ||
| journal =Moody's Risk Management Services |
| journal =Moody's Risk Management Services |
||
| date =2000-05-15 |
| date =2000-05-15 |
||
| url = http://www.rogermstein.com/wp-content/uploads/SobehartKeenanStein2000.pdf }}</ref> |
| url = http://www.rogermstein.com/wp-content/uploads/SobehartKeenanStein2000.pdf }}</ref> The CAP is also used by instructional design engineers to assess, retrain and rebuild instructional design models used in constructing courses, and by professors and school authorities for improved decision-making and managing educational resources more efficiently. |
||
== References == |
== References == |
Latest revision as of 13:43, 28 March 2024
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
A cumulative accuracy profile (CAP) is a concept utilized in data science to visualize discrimination power. The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis. The output is called a CAP curve.[1] The CAP is distinct from the receiver operating characteristic (ROC) curve, which plots the true-positive rate against the false-positive rate.
CAPs are used in robustness evaluations of classification models.
Analyzing a CAP
[edit]A cumulative accuracy profile can be used to evaluate a model by comparing the current curve to both the 'perfect' and a randomized curve. A good model will have a CAP between the perfect and random curves; the closer a model is to the perfect CAP, the better it is.
The accuracy ratio (AR) is defined as the ratio of the area between the model CAP and random CAP, and the area between the perfect CAP and random CAP.[2] In a successful model, the AR has values between zero and one, and the higher the value is, the stronger the model.
The cumulative number of positive outcomes indicates a model's strength. For a successful model, this value should lie between 50% and 100% of the maximum, with a higher percentage for stronger models. In sporadic cases, the accuracy ratio can be negative. In this case, the model is performing worse than the random CAP.
Applications
[edit]The cumulative accuracy profile (CAP) and ROC curve are both commonly used by banks and regulators to analyze the discriminatory ability of rating systems that evaluate credit risks.[3][4] The CAP is also used by instructional design engineers to assess, retrain and rebuild instructional design models used in constructing courses, and by professors and school authorities for improved decision-making and managing educational resources more efficiently.
References
[edit]- ^ "CUMULATIVE ACCURACY PROFILE AND ITS APPLICATION IN CREDIT RISK". www.linkedin.com. Retrieved 2020-12-11.
- ^ Calabrese, Raffaella (2009), The validation of Credit Rating and Scoring Models (PDF), Swiss Statistics Meeting, Geneva, Switzerland
{{citation}}
: CS1 maint: location missing publisher (link) - ^ Engelmann, Bernd; Hayden, Evelyn; Tasche, Dirk (2003), "Measuring the Discriminative Power of Rating Systems", Discussion Paper, Series 2: Banking and Financial Supervision (1)
- ^ Sobehart, Jorge; Keenan, Sean; Stein, Roger (2000-05-15), "Validation methodologies for default risk models" (PDF), Moody's Risk Management Services