Jump to content

Cumulative accuracy profile: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Ssamktg (talk | contribs)
m I made copy edits to grammar and spelling
m Edited for grammar and added one citation
Line 5: Line 5:
}}
}}


The '''cumulative accuracy profile''' (or CAP) is a concept utilized in [[data science]] to visualize the [[discrimination power]]. The CAP of a model represents the cumulative number of positive outcomes along the ''y''-axis versus the corresponding cumulative number of a classifying parameter along the ''x''-axis. The CAP is distinct from the [[receiver operating characteristic]] (ROC), which plots the [[true-positive rate]] against the [[false-positive rate]]. CAP is used in the performance evaluation of the classification model. It can be used to understand the robustness of the classification model.
The '''cumulative accuracy profile''' (or CAP) is a concept utilized in [[data science]] to visualize the [[discrimination power]]. The CAP of a model represents the cumulative number of positive outcomes along the ''y''-axis versus the corresponding cumulative number of a classifying parameter along the ''x''-axis. The resulting curve is called CAP curve<ref>{{Cite web|title=CUMULATIVE ACCURACY PROFILE AND ITS APPLICATION IN CREDIT RISK|url=https://www.linkedin.com/pulse/cumulative-accuracy-profile-its-application-credit-frm-prm-cma-acma|access-date=2020-12-11|website=www.linkedin.com|language=en}}</ref>. The CAP is distinct from the [[receiver operating characteristic]] (ROC), which plots the [[true-positive rate]] against the [[false-positive rate]]. CAP is used in the performance evaluation of the classification model. It can be used to understand the robustness of the classification model.


==Example==
==Example==
Imagine you are a salesman at a store selling clothes; the store has a total of 100,000 customers, which is placed on the horizontal axis. Based on observations, every time you offer a customer a deal, 10% of them respond and buy the product. This means that 10% of the total (10,000) is placed on the vertical axis. This means we have a proposal that we want to offer. We want to see a line that will represent the random selection. The line's [[slope]] equal to the 10% that we know responds on average to the offer. Now the question is, how do we decide which customers to offer the deal to? First, we must build a model, a customer segmentation model will predict whether or not they will purchase the product. It's a straightforward process. It's the same thing because purchased is also a binary variable, yes or no. We can also run the same experiment, and we can take a group of customers before we send out the offer and then look back and see who purchased, whether male or female, which country were they in, what age predominately, were they browsing on mobile or were they browsing via a computer. We take all of these factors into account, then put them into a logistic regression and get a model that will help us assess the likelihood of certain types of customers purchasing based on their characteristics or the general demographic status and other characteristics.
Imagine you are a salesman at a store selling clothes; the store has a total of 100,000 customers, which is placed on the horizontal axis. Based on observations, every time you offer a customer a deal, 10% of them respond and buy the product. This means that 10% of the total (10,000) is placed on the vertical axis. This means we have a proposal that we want to offer. We want to see a line that will represent the random selection. The line's [[slope]] is equal to the 10% that responds on average to the offer. Now the question is, how do we decide the customers to which we offer the deal? First, we must build a a customer segmentation model to predict whether or not they will purchase the product. It's a straightforward process. It's the same thing because purchased is also a binary variable, yes or no. We can also run the same experiment, and we can take a group of customers before we send out the offer and then look back and see who purchased, whether male or female, which country were they in, what age predominately, were they browsing on mobile or were they browsing via a computer. We take all of these factors into account, then put them into a logistic regression and get a model that will help us assess the likelihood of certain types of customers purchasing based on their characteristics or the general demographic status and other characteristics.
[[File:Cap curve.png|thumb|The CAP Curve for the perfect, good and random model predicting the buying customers from a pool of 100000 individuals.]]
[[File:Cap curve.png|thumb|The CAP Curve for the perfect, good and random model predicting the buying customers from a pool of 100000 individuals.]]



Revision as of 14:27, 11 December 2020

The cumulative accuracy profile (or CAP) is a concept utilized in data science to visualize the discrimination power. The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis. The resulting curve is called CAP curve[1]. The CAP is distinct from the receiver operating characteristic (ROC), which plots the true-positive rate against the false-positive rate. CAP is used in the performance evaluation of the classification model. It can be used to understand the robustness of the classification model.

Example

Imagine you are a salesman at a store selling clothes; the store has a total of 100,000 customers, which is placed on the horizontal axis. Based on observations, every time you offer a customer a deal, 10% of them respond and buy the product. This means that 10% of the total (10,000) is placed on the vertical axis. This means we have a proposal that we want to offer. We want to see a line that will represent the random selection. The line's slope is equal to the 10% that responds on average to the offer. Now the question is, how do we decide the customers to which we offer the deal? First, we must build a a customer segmentation model to predict whether or not they will purchase the product. It's a straightforward process. It's the same thing because purchased is also a binary variable, yes or no. We can also run the same experiment, and we can take a group of customers before we send out the offer and then look back and see who purchased, whether male or female, which country were they in, what age predominately, were they browsing on mobile or were they browsing via a computer. We take all of these factors into account, then put them into a logistic regression and get a model that will help us assess the likelihood of certain types of customers purchasing based on their characteristics or the general demographic status and other characteristics.

The CAP Curve for the perfect, good and random model predicting the buying customers from a pool of 100000 individuals.

Once this model has been built, we can apply it to a select customer and send the offer to a female customer of a bank whose favorite color is red. They're most likely to leave the bag, and we will have a similar result. Say perhaps a male customer in this certain age group who is most likely to purchase a mobile or something else if it will tell us something or actually rank our Customers. We'll give them the probability of purchasing, and we use the portability to contact your customer; of course, we contact we get zero response. If we contact 20000, we'll probably get a much higher response rate than just 2000 because we're contacted 2000. Our response rate will be higher than 4000, which we get in this random scenario. If our model is good, by the time we're at around 60 thousand or more, we are really getting to that 10000 marks, so we get 10000 people. So now this draws a line through these crosses. So what you see here is called the cumulative accuracy profile of your model.

Analyzing a CAP

CAP can be used to evaluate a model by comparing the curve to the perfect CAP. The maximum number of positive outcomes is achieved directly to the random CAP in which the positive outcomes are distributed equally. A good model will have a CAP between the perfect CAP and the random CAP. The closer a model is to the perfect CAP, the better is.

The accuracy ratio (AR) is defined as the ratio of the area between the model CAP and the random CAP and the area between the perfect CAP and the random CAP.[2] For a successful model, the AR has values between zero and one, with a higher value for a stronger model.

The cumulative number of positive outcomes indicates a model's strength at 50% of the classifying parameter. For a successful model, this value should lie between 50% and 100% of the maximum, with a higher percentage for stronger models.

In sporadic cases, the accuracy ratio can be negative. In this case, the model is performing worse than the random CAP.

Applications

CAP and Receiver operating characteristic (ROC) are both commonly used by banks and regulators to analyze the discriminatory ability of rating systems that evaluate credit risks.[3] [4]

References

  1. ^ "CUMULATIVE ACCURACY PROFILE AND ITS APPLICATION IN CREDIT RISK". www.linkedin.com. Retrieved 2020-12-11.
  2. ^ Calabrese, Raffaella (2009), The validation of Credit Rating and Scoring Models (PDF), Swiss Statistics Meeting, Geneva, Switzerland{{citation}}: CS1 maint: location missing publisher (link)
  3. ^ Engelmann, Bernd; Hayden, Evelyn; Tasche, Dirk (2003), "Measuring the Discriminative Power of Rating Systems", Discussion Paper, Series 2: Banking and Financial Supervision (No 01) {{citation}}: |issue= has extra text (help)
  4. ^ Sobehart, Jorge; Keenan, Sean; Stein, Roger (2000-05-15), "Validation methodologies for default risk models" (PDF), Moody's Risk Management Services