Association rule learning: Difference between revisions
m →AprioriDP: partial spelling and grammar correction |
Humbertorc (talk | contribs) m typo in latex "cup" instead of "cap" (union instead of intersection) in confidence calculation. |
||
(371 intermediate revisions by more than 100 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Method for discovering interesting relations between variables in databases}} |
|||
'''Association rule learning''' is a popular and well researched method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness.<ref name="piatetsky">Piatetsky-Shapiro, Gregory (1991), ''Discovery, analysis, and presentation of strong rules'', in Piatetsky-Shapiro, Gregory; and Frawley, William J.; eds., ''Knowledge Discovery in Databases'', AAAI/MIT Press, Cambridge, MA.</ref> Based on the concept of strong rules, Rakesh Agrawal et al.<ref name="mining">{{cite doi|10.1145/170035.170072}}</ref> introduced association rules for discovering regularities between products in large-scale transaction data recorded by [[point-of-sale]] (POS) systems in supermarkets. For example, the rule <math>\{\mathrm{onions, potatoes}\} \Rightarrow \{\mathrm{burger}\}</math> found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional [[pricing]] or [[product placement]]s. In addition to the above example from [[market basket analysis]] association rules are employed today in many application areas including [[Web usage mining]], [[intrusion detection]], [[Continuous production]], and [[bioinformatics]]. As opposed to [[sequence mining]], association rule learning typically does not consider the order of items either within a transaction or across transactions. |
|||
{{Machine learning|Problems}} |
|||
'''Association rule learning''' is a [[rule-based machine learning]] method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.<ref name="piatetsky">Piatetsky-Shapiro, Gregory (1991), ''Discovery, analysis, and presentation of strong rules'', in Piatetsky-Shapiro, Gregory; and Frawley, William J.; eds., ''Knowledge Discovery in Databases'', AAAI/MIT Press, Cambridge, MA.</ref> In any given transaction with a variety of items, association rules are meant to discover the rules that determine how or why certain items are connected. |
|||
Based on the concept of strong rules, [[Rakesh Agrawal (computer scientist)|Rakesh Agrawal]], [[Tomasz Imieliński]] and Arun Swami<ref name="mining">{{Cite book | last1 = Agrawal | first1 = R. | last2 = Imieliński | first2 = T. | last3 = Swami | first3 = A. | doi = 10.1145/170035.170072 | chapter = Mining association rules between sets of items in large databases | title = Proceedings of the 1993 ACM SIGMOD international conference on Management of data - SIGMOD '93 | pages = 207 | year = 1993 | isbn = 978-0897915922 | citeseerx = 10.1.1.40.6984 | s2cid = 490415 }}</ref> introduced association rules for discovering regularities between products in large-scale transaction data recorded by [[point-of-sale]] (POS) systems in supermarkets. For example, the rule <math>\{\mathrm{onions, potatoes}\} \Rightarrow \{\mathrm{burger}\}</math> found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional [[pricing]] or [[product placement]]s. |
|||
In addition to the above example from [[market basket analysis]], association rules are employed today in many application areas including [[Web usage mining]], [[intrusion detection]], [[continuous production]], and [[bioinformatics]]. In contrast with [[sequence mining]], association rule learning typically does not consider the order of items either within a transaction or across transactions. |
|||
The association rule algorithm itself consists of various parameters that can make it difficult for those without some expertise in data mining to execute, with many rules that are arduous to understand.<ref>{{Cite web|last=Garcia|first=Enrique|date=2007|title=Drawbacks and solutions of applying association rule mining in learning management systems|url=https://sci2s.ugr.es/keel/pdf/specific/congreso/3-associationrules-Final.pdf|url-status=live|website=Sci2s|archive-url=https://web.archive.org/web/20091223124403/http://sci2s.ugr.es/keel/pdf/specific/congreso/3-associationrules-Final.pdf |archive-date=2009-12-23 }}</ref> |
|||
== Definition == |
== Definition == |
||
[[File:Association Rule Mining Venn Diagram.png|thumb|A Venn Diagram to show the associations between itemsets X and Y of a dataset. All transactions that contain item X are located in the white, left portion of the circle, while those containing Y are colored red and on the right. Any transaction containing both X and Y are located in the middle and are colored pink. |
|||
Multiple concepts can be used to depict information from this graph. For example, if one takes all of the transactions in the pink section and divided them by the total amount of transactions (transactions containing X (white) + transactions containing Y(red)), the output would be known as the support. An instance of getting the result of a method known as the confidence, one can take all of the transactions in the middle (pink) and divide them by all transactions that contain Y (red and pink). |
|||
In this case, Y is the antecedent and X is the consequent. |
|||
]] |
|||
Following the original definition by Agrawal, Imieliński, Swami<ref name="mining" /> the problem of association rule mining is defined as: |
|||
Let <math>I=\{i_1, i_2,\ldots,i_n\}</math> be a set of {{mvar|n}} binary attributes called ''items''. |
|||
Let <math>D = \{t_1, t_2, \ldots, t_m\}</math> be a set of transactions called the ''database''. |
|||
Each ''transaction'' in {{mvar|D}} has a unique transaction ID and contains a subset of the items in {{mvar|I}}. |
|||
A ''rule'' is defined as an implication of the form: |
|||
:<math>X \Rightarrow Y</math>, where <math>X, Y \subseteq I</math>. |
|||
In Agrawal, Imieliński, Swami<ref name="mining" /> a ''rule'' is defined only between a set and a single item, <math>X \Rightarrow i_j</math> for <math>i_j \in I</math>. |
|||
Every rule is composed by two different sets of items, also known as ''itemsets'', {{mvar|X}} and {{mvar|Y}}, where {{mvar|X}} is called ''antecedent'' or left-hand-side (LHS) and {{mvar|Y}} ''consequent'' or right-hand-side (RHS). The antecedent is that item that can be found in the data while the consequent is the item found when combined with the antecedent. The statement <math>X \Rightarrow Y</math> is often read as ''if {{mvar|X}} then {{mvar|Y}}'', where the antecedent ({{mvar|X}} ) is the ''if'' and the consequent ({{mvar|Y}}) is the ''then''. This simply implies that, in theory, whenever {{mvar|X}} occurs in a dataset, then {{mvar|Y}} will as well. |
|||
== Process == |
|||
Association rules are made by searching data for frequent if-then patterns and by using a certain criterion under Support and Confidence to define what the most important relationships are. Support is the evidence of how frequent an item appears in the data given, as Confidence is defined by how many times the if-then statements are found true. However, there is a third criteria that can be used, it is called Lift and it can be used to compare the expected Confidence and the actual Confidence. Lift will show how many times the if-then statement is expected to be found to be true. |
|||
Association rules are made to calculate from itemsets, which are created by two or more items. If the rules were built from the analyzing from all the possible itemsets from the data then there would be so many rules that they wouldn’t have any meaning. That is why Association rules are typically made from rules that are well represented by the data. |
|||
There are many different data mining techniques you could use to find certain analytics and results, for example, there is Classification analysis, Clustering analysis, and Regression analysis.<ref>{{Cite web|date=2021-11-08|title=Data Mining Techniques: Top 5 to Consider|url=https://www.precisely.com/blog/datagovernance/top-5-data-mining-techniques|access-date=2021-12-10|website=Precisely|language=en-US}}</ref> What technique you should use depends on what you are looking for with your data. Association rules are primarily used to find analytics and a prediction of customer behavior. For Classification analysis, it would most likely be used to question, make decisions, and predict behavior.<ref name=":2">{{Cite web|title=16 Data Mining Techniques: The Complete List - Talend|url=https://www.talend.com/resources/data-mining-techniques/|access-date=2021-12-10|website=Talend - A Leader in Data Integration & Data Integrity|language=en}}</ref> Clustering analysis is primarily used when there are no assumptions made about the likely relationships within the data.<ref name=":2"/> Regression analysis Is used when you want to predict the value of a continuous dependent from a number of independent variables.<ref name=":2"/> |
|||
'''Benefits''' |
|||
There are many benefits of using Association rules like finding the pattern that helps understand the correlations and co-occurrences between data sets. A very good real-world example that uses Association rules would be medicine. Medicine uses Association rules to help diagnose patients. When diagnosing patients there are many variables to consider as many diseases will share similar symptoms. With the use of the Association rules, doctors can determine the conditional probability of an illness by comparing symptom relationships from past cases.<ref>{{Cite web|title=What are Association Rules in Data Mining (Association Rule Mining)?|url=https://searchbusinessanalytics.techtarget.com/definition/association-rules-in-data-mining|access-date=2021-12-10|website=SearchBusinessAnalytics|language=en}}</ref> |
|||
'''Downsides''' |
|||
However, Association rules also lead to many different downsides such as finding the appropriate parameter and threshold settings for the mining algorithm. But there is also the downside of having a large number of discovered rules. The reason is that this does not guarantee that the rules will be found relevant, but it could also cause the algorithm to have low performance. Sometimes the implemented algorithms will contain too many variables and parameters. For someone that doesn’t have a good concept of data mining, this might cause them to have trouble understanding it.<ref>{{Cite web|title=Drawbacks and solutions of applying association rule mining in learning management systems|url=https://www.researchgate.net/publication/289657906|access-date=2021-12-10|website=ResearchGate|language=en}}</ref> |
|||
'''Thresholds'''[[File:FrequentItems.png|thumb|Frequent itemset lattice, where the color of the box indicates how many transactions contain the combination of items. Note that lower levels of the lattice can contain at most the minimum number of their parents' items; e.g. {ac} can have only at most <math>\min(a,c)</math> items. This is called the ''downward-closure property''.<ref name="mining" />]]When using Association rules, you are most likely to only use Support and Confidence. However, this means you have to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Usually, the Association rule generation is split into two different steps that needs to be applied: |
|||
# A minimum Support threshold to find all the frequent itemsets that are in the database. |
|||
# A minimum Confidence threshold to the frequent itemsets found to create rules. |
|||
{| class="wikitable" |
|||
|+Table 1. Example of '''Threshold for''' Support and Confidence. |
|||
! scope="col" | Items |
|||
! scope="col" | Support |
|||
! scope="col" | Confidence |
|||
| rowspan="5" style="border: none; background: none;" | |
|||
! scope="col" | Items |
|||
! scope="col" | Support |
|||
! scope="col" | Confidence |
|||
|- |
|||
| Item A || 30%|| 50% || Item C || 45%|| 55% |
|||
|- |
|||
| Item B || 15%|| 25% || Item A || 30%|| 50% |
|||
|- |
|||
| Item C || 45%|| 55% || Item D || 35%|| 40% |
|||
|- |
|||
| Item D || 35%|| 40% || Item B || 15%|| 25% |
|||
|} |
|||
'''The Support Threshold is 30%, Confidence Threshold is 50%''' |
|||
'''The Table on the left is the original unorganized data and the table on the right is organized by the thresholds. In this case Item C is better than the thresholds for both Support and Confidence which is why it is first. Item A is second because its threshold values are spot on. Item D has met the threshold for Support but not Confidence. Item B has not met the threshold for either Support or Confidence and that is why it is last.''' |
|||
To find all the frequent itemsets in a database is not an easy task since it involves going through all the data to find all possible item combinations from all possible itemsets. The set of possible itemsets is the [[power set]] over {{mvar|I}} and has size <math>2^n-1</math> , of course this means to exclude the empty set which is not considered to be a valid itemset. However, the size of the power set will grow exponentially in the number of item {{mvar|n}} that is within the power set {{mvar|I}}. An efficient search is possible by using the '''''downward-closure property''''' of support<ref name="mining" /><ref>{{cite book|last1=Tan|first1=Pang-Ning|title=Introduction to Data Mining|last2=Michael|first2=Steinbach|last3=Kumar|first3=Vipin|publisher=[[Addison-Wesley]]|year=2005|isbn=978-0-321-32136-7|chapter=Chapter 6. Association Analysis: Basic Concepts and Algorithms|chapter-url=http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf}}</ref> (also called ''anti-monotonicity''<ref name="pei">{{cite book|last1=Jian Pei|title=Proceedings 17th International Conference on Data Engineering|last2=Jiawei Han|last3=Lakshmanan|first3=L.V.S.|year=2001|isbn=978-0-7695-1001-9|pages=433–442|chapter=Mining frequent itemsets with convertible constraints|citeseerx=10.1.1.205.2150|doi=10.1109/ICDE.2001.914856|s2cid=1080975}}</ref>). This would guarantee that a frequent itemset and all its subsets are also frequent and thus will have no infrequent itemsets as a subset of a frequent itemset. Exploiting this property, efficient algorithms (e.g., Apriori<ref name="apriori">Agrawal, Rakesh; and Srikant, Ramakrishnan; [http://rakesh.agrawal-family.com/papers/vldb94apriori.pdf ''Fast algorithms for mining association rules in large databases''] {{Webarchive|url=https://web.archive.org/web/20150225213708/http://rakesh.agrawal-family.com/papers/vldb94apriori.pdf|date=2015-02-25}}, in Bocca, Jorge B.; Jarke, Matthias; and Zaniolo, Carlo; editors, ''Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile, September 1994'', pages 487-499</ref> and Eclat<ref name="eclat">{{Cite journal|last1=Zaki|first1=M. J.|year=2000|title=Scalable algorithms for association mining|journal=IEEE Transactions on Knowledge and Data Engineering|volume=12|issue=3|pages=372–390|citeseerx=10.1.1.79.9448|doi=10.1109/69.846291}}</ref>) can find all frequent itemsets. |
|||
== Useful Concepts == |
|||
{|class="wikitable" style="float: right; margin-left: 1em;" |
{|class="wikitable" style="float: right; margin-left: 1em;" |
||
|+ Example database with |
|+ Table 2. Example database with 5 transactions and 7 items |
||
|- |
|- |
||
! transaction ID !! milk !! bread !! butter !! beer |
! transaction ID !! milk !! bread !! butter !! beer !! diapers !! eggs !! fruit |
||
|- |
|- |
||
| 1 || 1 || 1 || 0 || 0 |
| 1 || 1 || 1 || 0 || 0 || 0 || 0 || 1 |
||
|- |
|- |
||
| 2 || 0 || 0 || 1 || 0 |
| 2 || 0 || 0 || 1 || 0 || 0 || 1 || 1 |
||
|- |
|- |
||
| 3 || 0 || 0 || 0 || 1 |
| 3 || 0 || 0 || 0 || 1 || 1 || 0 || 0 |
||
|- |
|- |
||
| 4 || 1 || 1 || 1 || 0 |
| 4 || 1 || 1 || 1 || 0 || 0 || 1 || 1 |
||
|- |
|- |
||
| 5 || 0 || 1 || 0 || 0 |
| 5 || 0 || 1 || 0 || 0 || 0 || 0 || 0 |
||
|- |
|- |
||
|} |
|} |
||
To illustrate the concepts, we use a small example from the supermarket domain. Table 2 shows a small database containing the items where, in each entry, the value 1 means the presence of the item in the corresponding transaction, and the value 0 represents the absence of an item in that transaction. The set of items is <math>I= \{\mathrm{milk, bread, butter, beer, diapers, eggs, fruit}\}</math>. |
|||
An example rule for the supermarket could be <math>\{\mathrm{butter, bread}\} \Rightarrow \{\mathrm{milk}\}</math> meaning that if butter and bread are bought, customers also buy milk. |
|||
Following the original definition by Agrawal et al.<ref name="mining" /> the problem of association rule mining is defined as: Let <math>I=\{i_1, i_2,\ldots,i_n\}</math> be a set of <math>n</math> binary attributes called ''items''. Let <math>D = \{t_1, t_2, \ldots, t_m\}</math> be a set of transactions called the ''database''. Each transaction in <math>D</math> has a unique transaction ID and contains a subset of the items in <math>I</math>. A ''rule'' is defined as an implication of the form <math>X \Rightarrow Y</math> where <math>X, Y \subseteq I</math> and <math>X \cap Y = \emptyset</math>. The sets of items (for short ''itemsets'') <math>X</math> and <math>Y</math> are called ''antecedent'' (left-hand-side or LHS) and ''consequent'' (right-hand-side or RHS) of the rule respectively. |
|||
In order to select interesting rules from the set of all possible rules, constraints on various measures of significance and interest are used. The best-known constraints are minimum thresholds on support and confidence. |
|||
To illustrate the concepts, we use a small example from the supermarket domain. The set of items is <math>I= \{\mathrm{milk, bread, butter, beer}\}</math> and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right. An example rule for the supermarket could be <math>\{\mathrm{butter, bread}\} \Rightarrow \{\mathrm{milk}\}</math> meaning that if butter and bread are bought, customers also buy milk. |
|||
Let <math>X, Y</math> be itemsets, <math>X \Rightarrow Y</math> an association rule and {{mvar|T}} a set of transactions of a given database. |
|||
Note: this example is extremely small. In practical applications, a rule needs a support of several hundred transactions before it can be considered statistically significant, and datasets often contain thousands or millions of transactions. |
|||
Note: this example is extremely small. In practical applications, a rule needs a support of several hundred transactions before it can be considered statistically significant,{{citation needed|date=March 2021}} and datasets often contain thousands or millions of transactions. |
|||
== Useful Concepts == |
|||
To select interesting rules from the set of all possible rules, constraints on various measures of significance and interest can be used. The best-known constraints are minimum thresholds on support and confidence. |
|||
=== Support === |
|||
* The ''support'' <math>\mathrm{supp}(X)</math> of an itemset <math>X</math> is defined as the proportion of transactions in the data set which contain the itemset. In the example database, the itemset <math>\{\mathrm{milk, bread, butter}\}</math> has a support of <math>1/5=0.2</math> since it occurs in 20% of all transactions (1 out of 5 transactions). |
|||
Support is an indication of how frequently the itemset appears in the dataset. |
|||
In our example, it can be easier to explain support by writing <math>\text{support} = P(A\cap B)= \frac{(\text{number of transactions containing }A\text{ and }B)}\text{ (total number of transactions)} </math> <ref name=":1">{{Cite book|last1=Larose|first1=Daniel T.|last2=Larose|first2=Chantal D.|date=2014-06-23|title=Discovering Knowledge in Data|url=http://dx.doi.org/10.1002/9781118874059|doi=10.1002/9781118874059|isbn=9781118874059}}</ref> where A and B are separate item sets that occur at the same time in a transaction. |
|||
* The ''confidence'' of a rule is defined <math>\mathrm{conf}(X\Rightarrow Y) = \mathrm{supp}(X \cup Y) / \mathrm{supp}(X)</math>. For example, the rule <math>\{\mathrm{butter, bread}\} \Rightarrow \{\mathrm{milk}\}</math> has a confidence of <math>0.2/0.2=1.0</math> in the database, which means that for 100% of the transactions containing butter and bread the rule is correct (100% of the times a customer buys butter and bread, milk is bought as well). Be careful when reading the expression: here supp(X∪Y) means "''support for occurrences of transactions where '''X and Y both appear'''''", not "''support for occurrences of transactions where '''either X or Y appears'''''", the latter interpretation arising because set union is equivalent to [[logical disjunction]]. The argument of <math>\mathrm{supp}()</math> is a set of preconditions, and thus becomes more restrictive as it grows (instead of more inclusive). |
|||
Using Table 2 as an example, the itemset <math>X=\{\mathrm{beer, diapers}\}</math> has a support of {{math|1=1/5=0.2}} since it occurs in 20% of all transactions (1 out of 5 transactions). The argument of ''support of X'' is a set of preconditions, and thus becomes more restrictive as it grows (instead of more inclusive).<ref name=":0">{{Cite journal|last=Hahsler|first=Michael|date=2005|title=Introduction to arules – A computational environment for mining association rules and frequent item sets|url=https://mran.revolutionanalytics.com/web/packages/arules/vignettes/arules.pdf|journal=Journal of Statistical Software|doi=10.18637/jss.v014.i15|doi-access=free|access-date=2016-03-18|archive-date=2019-04-30|archive-url=https://web.archive.org/web/20190430193743/https://mran.revolutionanalytics.com/web/packages/arules/vignettes/arules.pdf|url-status=dead}}</ref> |
|||
* Confidence can be interpreted as an estimate of the probability <math>P(Y|X)</math>, the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS.<ref name="hipp">{{cite doi|10.1145/360402.360421}}</ref> |
|||
Furthermore, the itemset <math>Y=\{\mathrm{milk, bread, butter}\}</math> has a support of {{math|1=1/5=0.2}} as it appears in 20% of all transactions as well. |
|||
* The ''[[lift (data mining)|lift]]'' of a rule is defined as <math> \mathrm{lift}(X\Rightarrow Y) = \frac{ \mathrm{supp}(X \cup Y)}{ \mathrm{supp}(X) \times \mathrm{supp}(Y) } </math> or the ratio of the observed support to that expected if X and Y were [[Independence (probability theory)|independent]]. The rule <math>\{\mathrm{milk, bread}\} \Rightarrow \{\mathrm{butter}\}</math> has a lift of <math>\frac{0.2}{0.4 \times 0.4} = 1.25 </math>. |
|||
When using antecedents and consequents, it allows a data miner to determine the support of multiple items being bought together in comparison to the whole data set. For example, Table 2 shows that if milk is bought, then bread is bought has a support of 0.4 or 40%. This because in 2 out 5 of the transactions, milk as well as bread are bought. In smaller data sets like this example, it is harder to see a strong correlation when there are few samples, but when the data set grows larger, support can be used to find correlation between two or more products in the supermarket example. |
|||
* The ''conviction'' of a rule is defined as <math> \mathrm{conv}(X\Rightarrow Y) =\frac{ 1 - \mathrm{supp}(Y) }{ 1 - \mathrm{conf}(X\Rightarrow Y)}</math>. The rule <math>\{\mathrm{milk, bread}\} \Rightarrow \{\mathrm{butter}\}</math> has a conviction of <math>\frac{1 - 0.4}{1 - .5} = 1.2 </math>, and can be interpreted as the ratio of the expected frequency that X occurs without Y (that is to say, the frequency that the rule makes an incorrect prediction) if X and Y were independent divided by the observed frequency of incorrect predictions. In this example, the conviction value of 1.2 shows that the rule <math>\{\mathrm{milk, bread}\} \Rightarrow \{\mathrm{butter}\}</math> would be incorrect 20% more often (1.2 times as often) if the association between X and Y was purely random chance. |
|||
Minimum support thresholds are useful for determining which itemsets are preferred or interesting. |
|||
== Process == |
|||
[[File:FrequentItems.png|thumb|Frequent itemset lattice, where the color of the box indicates how many transactions contain the combination of items. Note that lower levels of the lattice can contain at most the minimum number of their parents' items; e.g. {ac} can have only at most <math>min(a,c)</math> items. This is called the ''downward-closure property''.<ref name="mining" />]] Association rules are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Association rule generation is usually split up into two separate steps: |
|||
# First, minimum support is applied to find all ''frequent itemsets'' in a database. |
|||
# Second, these frequent itemsets and the minimum confidence constraint are used to form rules. |
|||
While the second step is straightforward, the first step needs more attention. |
|||
If we set the support threshold to ≥0.4 in Table 3, then the <math>\{\mathrm{milk}\} \Rightarrow \{\mathrm{eggs}\}</math> would be removed since it did not meet the minimum threshold of 0.4. Minimum threshold is used to remove samples where there is not a strong enough support or confidence to deem the sample as important or interesting in the dataset. |
|||
Finding all frequent itemsets in a database is difficult since it involves searching all possible itemsets (item combinations). The set of possible itemsets is the [[power set]] over <math>I</math> and has size <math>2^n-1</math> (excluding the empty set which is not a valid itemset). Although the size of the powerset grows exponentially in the number of items <math>n</math> in <math>I</math>, efficient search is possible using the '''''downward-closure property''''' of support<ref name="mining" /><ref>{{cite book |last1=Tan |first1=Pang-Ning |last2=Michael |first2=Steinbach |last3=Kumar |first3=Vipin |title=Introduction to Data Mining |publisher=[[Addison-Wesley]] |year=2005 |isbn=0-321-32136-7 |chapter=Chapter 6. Association Analysis: Basic Concepts and Algorithms |chapterurl=http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf }}</ref> (also called ''anti-monotonicity''<ref name="pei">Pei, Jian; Han, Jiawei; and Lakshmanan, Laks V. S.; ''Mining frequent itemsets with convertible constraints'', in ''Proceedings of the 17th International Conference on Data Engineering, April 2–6, 2001, Heidelberg, Germany'', 2001, pages 433-442</ref>) which guarantees that for a frequent itemset, all its subsets are also frequent and thus for an infrequent itemset, all its supersets must also be infrequent. Exploiting this property, efficient algorithms (e.g., Apriori<ref name="apriori">Agrawal, Rakesh; and Srikant, Ramakrishnan; [http://rakesh.agrawal-family.com/papers/vldb94apriori.pdf ''Fast algorithms for mining association rules in large databases''], in Bocca, Jorge B.; Jarke, Matthias; and Zaniolo, Carlo; editors, ''Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile, September 1994'', pages 487-499</ref> and Eclat<ref name="eclat">{{cite doi|10.1109/69.846291}}</ref>) can find all frequent itemsets. |
|||
Another way of finding interesting samples is to find the value of (support)×(confidence); this allows a data miner to see the samples where support and confidence are high enough to be highlighted in the dataset and prompt a closer look at the sample to find more information on the connection between the items. |
|||
==History== |
|||
The concept of association rules was popularised particularly due to the 1993 article of Agrawal et al.,<ref name="mining" /> which has acquired more than 6000 citations according to Google Scholar, as of March 2008, and is thus one of the most cited papers in the Data Mining field. However, it is possible that what is now called "association rules" is similar to what appears in the 1966 paper<ref name="guha_oldest">Hájek, Petr; Havel, Ivan; Chytil, Metoděj; ''The GUHA method of automatic hypotheses determination'', Computing 1 (1966) 293-308</ref> on GUHA, a general data mining method developed by [[Petr Hájek]] et al.<ref name="pospaper">Hájek, Petr; Feglar, Tomas; Rauch, Jan; and Coufal, David; ''The GUHA method, data preprocessing and mining'', Database Support for Data Mining Applications, Springer, 2004, ISBN 978-3-540-22479-2</ref> |
|||
Support can be beneficial for finding the connection between products in comparison to the whole dataset, whereas confidence looks at the connection between one or more items and another item. Below is a table that shows the comparison and contrast between support and support × confidence, using the information from Table 4 to derive the confidence values. |
|||
== Alternative measures of interestingness == |
|||
<!-- would be nice to explain each measure --> |
|||
In addition to confidence, other measures of ''interestingness'' for rules have been proposed. Some popular measures are: |
|||
{| class="wikitable sortable" |
|||
* All-confidence<ref name="allconfidence">Omiecinski, Edward R.; ''Alternative interest measures for mining associations in databases'', IEEE Transactions on Knowledge and Data Engineering, 15(1):57-69, Jan/Feb 2003</ref> |
|||
|+Table 3. Example of Support, and support × confidence |
|||
!if Antecedent then Consequent |
|||
!support |
|||
!support X confidence |
|||
|- |
|||
|if buy milk, then buy bread |
|||
|2/5= 0.4 |
|||
|0.4×1.0= 0.4 |
|||
|- |
|||
|if buy milk, then buy eggs |
|||
|1/5= 0.2 |
|||
|0.2×0.5= 0.1 |
|||
|- |
|||
|if buy bread, then buy fruit |
|||
|2/5= 0.4 |
|||
|0.4×0.66= 0.264 |
|||
|- |
|||
|if buy fruit, then buy eggs |
|||
|2/5= 0.4 |
|||
|0.4×0.66= 0.264 |
|||
|- |
|||
|if buy milk and bread, then buy fruit |
|||
|2/5= 0.4 |
|||
|0.4×1.0= 0.4 |
|||
|} |
|||
The support of {{mvar|X}} with respect to {{mvar|T}} is defined as the proportion of transactions in the dataset which contains the itemset {{mvar|X}}. Denoting a transaction by <math>(i,t)</math> where {{mvar|i}} is the unique identifier of the transaction and {{mvar|t}} is its itemset, the support may be written as: |
|||
* Collective strength<ref name="collectivestrength">Aggarwal, Charu C.; and Yu, Philip S.; ''A new framework for itemset generation'', in ''PODS 98, Symposium on Principles of Database Systems, Seattle, WA, USA, 1998'', pages 18-24</ref> |
|||
:<math>\mathrm{support\,of\,X} = \frac{|\{(i,t) \in T : X \subseteq t \}|}{|T|}</math> |
|||
* Conviction<ref name="brin-dynamic-itemset1">Brin, Sergey; Motwani, Rajeev; Ullman, Jeffrey D.; and Tsur, Shalom; ''Dynamic itemset counting and implication rules for market basket data'', in ''SIGMOD 1997, Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1997), Tucson, Arizona, USA, May 1997'', pp. 255-264</ref> |
|||
This notation can be used when defining more complicated datasets where the items and itemsets may not be as easy as our supermarket example above. Other examples of where support can be used is in finding groups of genetic mutations that work collectively to cause a disease, investigating the number of subscribers that respond to upgrade offers, and discovering which products in a drug store are never bought together.<ref name=":1" /> |
|||
* Leverage<ref name="leverage">Piatetsky-Shapiro, Gregory; ''Discovery, analysis, and presentation of strong rules'', Knowledge Discovery in Databases, 1991, pp. 229-248</ref> |
|||
=== Confidence === |
|||
* Lift (originally called interest)<ref name="brin-dynamic-itemset2">Brin, Sergey; Motwani, Rajeev; Ullman, Jeffrey D.; and Tsur, Shalom; ''Dynamic itemset counting and implication rules for market basket data'', in ''SIGMOD 1997, Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1997), Tucson, Arizona, USA, May 1997'', pp. 265-276</ref> |
|||
Confidence is the percentage of all transactions satisfying {{mvar|X}} that also satisfy {{mvar|Y}}.<ref>{{Cite web|last=Wong|first=Pak|date=1999|title=Visualizing Association Rules for Text Mining|url=https://neuro.bstu.by/ai/Data-mining/Stock-market/InfoVis1999Association.pdf|url-status=live|website=BSTU Laboratory of Artificial Neural Networks|archive-url=https://web.archive.org/web/20211129082512/https://neuro.bstu.by/ai/Data-mining/Stock-market/InfoVis1999Association.pdf |archive-date=2021-11-29 }}</ref> |
|||
With respect to {{mvar|T}}, the confidence value of an association rule, often denoted as <math>X \Rightarrow Y</math>, is the ratio of transactions containing both {{mvar|X}} and {{mvar|Y}} to the total amount of {{mvar|X}} values present, where {{mvar|X}} is the antecedent and {{mvar|Y}} is the consequent. |
|||
Confidence can also be interpreted as an estimate of the [[conditional probability]] <math>P(E_Y | E_X)</math>, the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS.<ref name=":0" /><ref name="hipp">{{Cite journal | last1 = Hipp | first1 = J. | last2 = Güntzer | first2 = U. | last3 = Nakhaeizadeh | first3 = G. | title = Algorithms for association rule mining --- a general survey and comparison | doi = 10.1145/360402.360421 | journal = ACM SIGKDD Explorations Newsletter | volume = 2 | pages = 58–64 | year = 2000 | citeseerx = 10.1.1.38.5305 | s2cid = 9248096 }}</ref> |
|||
It is commonly depicted as: |
|||
:<math>\mathrm{conf}(X \Rightarrow Y) = P(Y | X) = \frac{\mathrm{supp}(X \cap Y)}{ \mathrm{supp}(X) }=\frac{\text{number of transactions containing }X\text{ and }Y}{\text{number of transactions containing }X}</math> |
|||
The equation illustrates that confidence can be computed by calculating the co-occurrence of transactions {{mvar|X}} and {{mvar|Y}} within the dataset in ratio to transactions containing only {{mvar|X}}. This means that the number of transactions in both {{mvar|X}} and {{mvar|Y}} is divided by those just in {{mvar|X}} . |
|||
For example, Table 2 shows the rule <math>\{\mathrm{butter, bread}\} \Rightarrow \{\mathrm{milk}\}</math> which has a confidence of <math>\frac{1/5}{1/5}=\frac{0.2}{0.2}=1.0</math> in the dataset, which denotes that every time a customer buys butter and bread, they also buy milk. This particular example demonstrates the rule being correct 100% of the time for transactions containing both butter and bread. The rule <math>\{\mathrm{fruit}\} \Rightarrow \{\mathrm{eggs}\}</math>, however, has a confidence of <math>\frac{2/5}{3/5}=\frac{0.4}{0.6}=0.67</math>. This suggests that eggs are bought 67% of the times that fruit is brought. Within this particular dataset, fruit is purchased a total of 3 times, with two of those times consisting of egg purchases. |
|||
For larger datasets, a minimum threshold, or a percentage cutoff, for the confidence can be useful for determining item relationships. When applying this method to some of the data in Table 2, information that does not meet the requirements are removed. Table 4 shows association rule examples where the minimum threshold for confidence is 0.5 (50%). Any data that does not have a confidence of at least 0.5 is omitted. Generating thresholds allow for the association between items to become stronger as the data is further researched by emphasizing those that co-occur the most. The table uses the confidence information from Table 3 to implement the Support × Confidence column, where the relationship between items via their both confidence and support, instead of just one concept, is highlighted. Ranking the rules by Support × Confidence multiples the confidence of a particular rule to its support and is often implemented for a more in-depth understanding of the relationship between the items. |
|||
{| class="wikitable sortable" |
|||
|+Table 4. Example of Confidence and Support × Confidence |
|||
!if Antecedent then Consequent |
|||
!Confidence |
|||
!Support × Confidence |
|||
|- |
|||
|if buy milk, then buy bread |
|||
|{{frac|2|2}} = 1.0 |
|||
|0.4×1.0= 0.4 |
|||
|- |
|||
|if buy milk, then buy eggs |
|||
|{{1/2}} = 0.5 |
|||
|0.2×0.5= 0.1 |
|||
|- |
|||
|if buy bread, then buy fruit |
|||
|{{2/3}} ≈ 0.66 |
|||
|0.4×0.66= 0.264 |
|||
|- |
|||
|if buy fruit, then buy eggs |
|||
|{{2/3}} ≈ 0.66 |
|||
|0.4×0.66= 0.264 |
|||
|- |
|||
|if buy milk and bread, then buy fruit |
|||
|{{frac|2|2}} = 1.0 |
|||
|0.4×1.0= 0.4 |
|||
|} |
|||
Overall, using confidence in association rule mining is great way to bring awareness to data relations. Its greatest benefit is highlighting the relationship between particular items to one another within the set, as it compares co-occurrences of items to the total occurrence of the antecedent in the specific rule. However, confidence is not the optimal method for every concept in association rule mining. The disadvantage of using it is that it does not offer multiple difference outlooks on the associations. Unlike support, for instance, confidence does not provide the perspective of relationships between certain items in comparison to the entire dataset, so while milk and bread, for example, may occur 100% of the time for confidence, it only has a support of 0.4 (40%). This is why it is important to look at other viewpoints, such as Support × Confidence, instead of solely relying on one concept incessantly to define the relationships. |
|||
=== Lift === |
|||
The ''[[lift (data mining)|lift]]'' of a rule is defined as: |
|||
<math> \mathrm{lift}(X\Rightarrow Y) = \frac{ \mathrm{supp}(X \cup Y)}{ \mathrm{supp}(X) \times \mathrm{supp}(Y) } </math> |
|||
or the ratio of the observed support to that expected if X and Y were [[Independence (probability theory)|independent]]. |
|||
For example, the rule <math>\{\mathrm{milk, bread}\} \Rightarrow \{\mathrm{butter}\}</math> has a lift of <math>\frac{0.2}{0.4 \times 0.4} = 1.25 </math>. |
|||
If the rule had a lift of 1, it would imply that the probability of occurrence of the antecedent and that of the consequent are independent of each other. When two events are independent of each other, no rule can be drawn involving those two events. |
|||
If the lift is > 1, that lets us know the degree to which those two occurrences are dependent on one another, and makes those rules potentially useful for predicting the consequent in future data sets. |
|||
If the lift is < 1, that lets us know the items are substitute to each other. This means that presence of one item has negative effect on presence of other item and vice versa. |
|||
The value of lift is that it considers both the support of the rule and the overall data set.<ref name=":0" /> |
|||
[rede] |
|||
=== Conviction === |
|||
The ''conviction'' of a rule is defined as <math> \mathrm{conv}(X\Rightarrow Y) =\frac{ 1 - \mathrm{supp}(Y) }{ 1 - \mathrm{conf}(X\Rightarrow Y)}</math>.<ref name="brin-dynamic-itemset1">{{cite book |doi=10.1145/253260.253325 |chapter=Dynamic itemset counting and implication rules for market basket data |title=Proceedings of the 1997 ACM SIGMOD international conference on Management of data - SIGMOD '97 |pages=255–264 |year=1997 |last1=Brin |first1=Sergey |last2=Motwani |first2=Rajeev |last3=Ullman |first3=Jeffrey D. |last4=Tsur |first4=Shalom |isbn=978-0897919111 |citeseerx=10.1.1.41.6476 |s2cid=15385590 }}</ref> |
|||
For example, the rule <math>\{\mathrm{milk, bread}\} \Rightarrow \{\mathrm{butter}\}</math> has a conviction of <math>\frac{1 - 0.4}{1 - 0.5} = 1.2 </math>, and can be interpreted as the ratio of the expected frequency that X occurs without Y (that is to say, the frequency that the rule makes an incorrect prediction) if X and Y were independent divided by the observed frequency of incorrect predictions. In this example, the conviction value of 1.2 shows that the rule <math>\{\mathrm{milk, bread}\} \Rightarrow \{\mathrm{butter}\}</math> would be incorrect 20% more often (1.2 times as often) if the association between X and Y was purely random chance. |
|||
=== Alternative measures of interestingness === |
|||
In addition to confidence, other measures of ''interestingness'' for rules have been proposed. Some popular measures are: |
|||
* All-confidence<ref name="allconfidence">{{cite journal |doi=10.1109/TKDE.2003.1161582 |title=Alternative interest measures for mining associations in databases |journal=IEEE Transactions on Knowledge and Data Engineering |volume=15 |pages=57–69 |year=2003 |last1=Omiecinski |first1=E.R. |citeseerx=10.1.1.329.5344 |s2cid=18364249 }}</ref> |
|||
* Collective strength<ref name="collectivestrength">{{cite book |doi=10.1145/275487.275490 |chapter=A new framework for itemset generation |title=Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems - PODS '98 |pages=18–24 |year=1998 |last1=Aggarwal |first1=Charu C. |last2=Yu |first2=Philip S. |isbn=978-0897919968 |citeseerx=10.1.1.24.714 |s2cid=11934586 }}</ref> |
|||
* Leverage<ref name="leverage">Piatetsky-Shapiro, Gregory; ''Discovery, analysis, and presentation of strong rules'', Knowledge Discovery in Databases, 1991, pp. 229-248</ref> |
|||
Several more measures are presented and compared by Tan et al.<ref name="measurescomp">{{cite journal |doi=10.1016/S0306-4379(03)00072-3 |title=Selecting the right objective measure for association analysis |journal=Information Systems |volume=29 |issue=4 |pages=293–313 |year=2004 |last1=Tan |first1=Pang-Ning |last2=Kumar |first2=Vipin |last3=Srivastava |first3=Jaideep |citeseerx=10.1.1.331.4740 }}</ref> and by Hahsler.<ref name="michael.hahsler.net">Michael Hahsler (2015). A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules. https://mhahsler.github.io/arules/docs/measures</ref> Looking for techniques that can model what the user has known (and using these models as interestingness measures) is currently an active research trend under the name of "Subjective Interestingness." |
|||
==History== |
|||
The concept of association rules was popularized particularly due to the 1993 article of Agrawal et al.,<ref name="mining" /> which has acquired more than 23,790 citations according to Google Scholar, as of April 2021, and is thus one of the most cited papers in the Data Mining field. However, what is now called "association rules" is introduced already in the 1966 paper<ref name="guha_oldest">{{cite journal |doi=10.1007/BF02345483 |title=The GUHA method of automatic hypotheses determination |journal=Computing |volume=1 |issue=4 |pages=293–308 |year=1966 |last1=Hájek |first1=P. |last2=Havel |first2=I. |last3=Chytil |first3=M. |s2cid=10511114 }}</ref> on GUHA, a general data mining method developed by [[Petr Hájek]] et al.<ref name="pospaper">{{cite book |doi=10.1007/978-3-540-44497-8_7 |chapter=The GUHA Method, Data Preprocessing and Mining |title=Database Support for Data Mining Applications |volume=2682 |pages=135–153 |series=Lecture Notes in Computer Science |year=2004 |last1=Hájek |first1=Petr |last2=Rauch |first2=Jan |last3=Coufal |first3=David |last4=Feglar |first4=Tomáš |isbn=978-3-540-22479-2 }}</ref> |
|||
An early (circa 1989) use of minimum support and confidence to find all association rules is the Feature Based Modeling framework, which found all rules with <math>\mathrm{supp}(X)</math> and <math>\mathrm{conf}(X \Rightarrow Y)</math> greater than user defined constraints.<ref>{{cite journal|last1=Webb|first1=Geoffrey|title=A Machine Learning Approach to Student Modelling|journal=Proceedings of the Third Australian Joint Conference on Artificial Intelligence (AI 89)|date=1989|pages=195–205}}</ref> |
|||
A definition of these measures can be found [http://michael.hahsler.net/research/association_rules/measures.html here]. Several more measures are presented and compared by Tan et al.<ref name="measurescomp">Tan, Pang-Ning; Kumar, Vipin; and Srivastava, Jaideep; ''Selecting the right objective measure for association analysis'', Information Systems, 29(4):293-313, 2004</ref> Looking for techniques that can model what the user has known (and using these models as interestingness measures) is currently an active research trend under the name of "Subjective Interestingness." |
|||
== Statistically sound associations == |
== Statistically sound associations == |
||
One limitation of the standard approach to discovering associations is that by searching massive numbers of possible associations to look for collections of items that appear to be associated, there is a large risk of finding many spurious associations. |
One limitation of the standard approach to discovering associations is that by searching massive numbers of possible associations to look for collections of items that appear to be associated, there is a large risk of finding many spurious associations. These are collections of items that co-occur with unexpected frequency in the data, but only do so by chance. For example, suppose we are considering a collection of 10,000 items and looking for rules containing two items in the left-hand-side and 1 item in the right-hand-side. There are approximately 1,000,000,000,000 such rules. If we apply a statistical test for independence with a significance level of 0.05 it means there is only a 5% chance of accepting a rule if there is no association. If we assume there are no associations, we should nonetheless expect to find 50,000,000,000 rules. Statistically sound association discovery<ref>{{cite journal |doi=10.1007/s10994-007-5006-x |title=Discovering Significant Patterns |journal=Machine Learning |volume=68 |pages=1–33 |year=2007 |last1=Webb |first1=Geoffrey I. |doi-access=free }}</ref><ref>{{cite journal |doi=10.1145/1297332.1297338 |title=Assessing data mining results via swap randomization |journal=ACM Transactions on Knowledge Discovery from Data |volume=1 |issue=3 |pages=14–es |year=2007 |last1=Gionis |first1=Aristides |last2=Mannila |first2=Heikki |last3=Mielikäinen |first3=Taneli |last4=Tsaparas |first4=Panayiotis |citeseerx=10.1.1.141.2607 |s2cid=52305658 }}</ref> controls this risk, in most cases reducing the risk of finding ''any'' spurious associations to a user-specified significance level. |
||
== Algorithms == |
== Algorithms == |
||
Many algorithms for generating association rules |
Many algorithms for generating association rules have been proposed. |
||
Some well |
Some well-known algorithms are [[Apriori algorithm|Apriori]], Eclat and FP-Growth, but they only do half the job, since they are algorithms for mining frequent itemsets. Another step needs to be done after to generate rules from frequent itemsets found in a database. |
||
=== Apriori algorithm === |
=== Apriori algorithm === |
||
Apriori is given by R. Agrawal and R. Srikant in 1994 for frequent item set mining and association rule learning. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often. The name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties. |
|||
{{Main|Apriori algorithm}} |
|||
[[File:APriori.png|thumb|357x357px|The control flow diagram for the Apriori algorithm]] |
|||
Apriori<ref name="apriori" /> is the best-known algorithm to mine association rules. It uses a breadth-first search strategy to count the support of itemsets and uses a candidate generation function which exploits the downward closure property of support. |
|||
'''Overview:''' [[Apriori algorithm|Apriori]] uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as ''candidate generation''), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. Apriori uses [[breadth-first search]] and a [[Hash tree (persistent data structure)|Hash tree]] structure to count candidate item sets efficiently. It generates candidate item sets of length from item sets of length . Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequent -length item sets. After that, it scans the transaction database to determine frequent item sets among the candidates. |
|||
'''Example:''' Assume that each row is a cancer sample with a certain combination of mutations labeled by a character in the alphabet. For example a row could have {a, c} which means it is affected by mutation 'a' and mutation 'c'. |
|||
{| class="wikitable" |
|||
|+Input Set |
|||
!{a, b} |
|||
!{c, d} |
|||
!{a, d} |
|||
!{a, e} |
|||
!{b, d} |
|||
!{a, b, d} |
|||
!{a, c, d} |
|||
!{a, b, c, d} |
|||
|} |
|||
Now we will generate the frequent item set by counting the number of occurrences of each character. This is also known as finding the support values. Then we will prune the item set by picking a minimum support threshold. For this pass of the algorithm we will pick 3. |
|||
{| class="wikitable" |
|||
|+Support Values |
|||
!a |
|||
!b |
|||
!c |
|||
!d |
|||
|- |
|||
|6 |
|||
|4 |
|||
|3 |
|||
|6 |
|||
|} |
|||
Since all support values are three or above there is no pruning. The frequent item set is {a}, {b}, {c}, and {d}. After this we will repeat the process by counting pairs of mutations in the input set. |
|||
{| class="wikitable" |
|||
|+Support Values |
|||
!{a, b} |
|||
!{a, c} |
|||
!{a, d} |
|||
!{b, c} |
|||
!{b, d} |
|||
!{c, d} |
|||
|- |
|||
|3 |
|||
|2 |
|||
|4 |
|||
|1 |
|||
|3 |
|||
|3 |
|||
|} |
|||
Now we will make our minimum support value 4 so only {a, d} will remain after pruning. Now we will use the frequent item set to make combinations of triplets. We will then repeat the process by counting occurrences of triplets of mutations in the input set. |
|||
{| class="wikitable" |
|||
|+Support Values |
|||
!{a, c, d} |
|||
|- |
|||
|2 |
|||
|} |
|||
Since we only have one item the next set of combinations of quadruplets is empty so the algorithm will stop. |
|||
'''Advantages and Limitations:''' |
|||
Apriori has some limitations. Candidate generation can result in large candidate sets. For example a 10^4 frequent 1-itemset will generate a 10^7 candidate 2-itemset. The algorithm also needs to frequently scan the database, to be specific n+1 scans where n is the length of the longest pattern. Apriori is slower than the Eclat algorithm. However, Apriori performs well compared to Eclat when the dataset is large. This is because in the Eclat algorithm if the dataset is too large the tid-lists become too large for memory. FP-growth outperforms the Apriori and Eclat. This is due to the FP-growth algorithm not having candidate generation or test, using a compact data structure, and only having one database scan.<ref>{{cite arXiv|last=Heaton|first=Jeff|date=2017-01-30|title=Comparing Dataset Characteristics that Favor the Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms|class=cs.DB|eprint=1701.09042}}</ref> |
|||
=== Eclat algorithm === |
=== Eclat algorithm === |
||
Eclat<ref name="eclat" /> (alt. ECLAT, stands for Equivalence Class Transformation) is a [[backtracking]] algorithm, which traverses the frequent itemset lattice graph in a [[depth-first search]] (DFS) fashion. Whereas the [[breadth-first search]] (BFS) traversal used in the Apriori algorithm will end up checking every subset of an itemset before checking it, DFS traversal checks larger itemsets and can save on checking the support of some of its subsets by virtue of the downward-closer property. Furthermore it will almost certainly use less memory as DFS has a lower space complexity than BFS. |
|||
Eclat<ref name="eclat" /> (alt. ECLAT, stands for Equivalence Class Transformation) is a depth-first search algorithm using set intersection. |
|||
To illustrate this, let there be a frequent itemset {a, b, c}. a DFS may check the nodes in the frequent itemset lattice in the following order: {a} → {a, b} → {a, b, c}, at which point it is known that {b}, {c}, {a, c}, {b, c} all satisfy the support constraint by the downward-closure property. BFS would explore each subset of {a, b, c} before finally checking it. As the size of an itemset increases, the number of its subsets undergoes [[combinatorial explosion]]. |
|||
=== FP-growth algorithm === |
|||
It is suitable for both sequential as well as parallel execution with locality-enhancing properties.<ref>{{cite journal |citeseerx=10.1.1.42.3283 |hdl=1802/501 |title=New Algorithms for Fast Discovery of Association Rules |pages=283–286 |year=1997 |first1=Mohammed Javeed |last1=Zaki |first2=Srinivasan |last2=Parthasarathy |first3=Mitsunori |last3=Ogihara |first4=Wei |last4=Li }}</ref><ref>{{cite journal|title=Parallel Algorithms for Discovery of Association Rules|doi=10.1023/A:1009773317876 |year=1997 |last1=Zaki |first1=Mohammed J. |journal=Data Mining and Knowledge Discovery |volume=1 |issue=4 |pages=343–373 |last2=Parthasarathy |first2=Srinivasan |last3=Ogihara |first3=Mitsunori |last4=Li |first4=Wei |s2cid=10038675 }}</ref> |
|||
FP stands for frequent pattern. |
|||
=== FP-growth algorithm === |
|||
In the first pass, the algorithm counts occurrence of items (attribute-value pairs) in the dataset, and stores them to 'header table'. In the second pass, it builds the FP-tree structure by inserting instances. |
|||
Items in each instance have to be sorted by descending order of their frequency in the dataset, so that the tree can be processed quickly. |
|||
Items in each instance that do not meet minimum coverage threshold are discarded. |
|||
If many instances share most frequent items, FP-tree provides high compression close to tree root. |
|||
FP stands for frequent pattern.<ref>{{cite book|last1=Han|title=Proceedings of the 2000 ACM SIGMOD international conference on Management of data |chapter=Mining frequent patterns without candidate generation |date=2000|volume=SIGMOD '00|pages=1–12|doi=10.1145/342009.335372|isbn=978-1581132175|citeseerx=10.1.1.40.4436|s2cid=6059661}}</ref> |
|||
Recursive processing of this compressed version of main dataset grows large item sets directly, instead of generating candidate items and testing them against the entire database. |
|||
Growth starts from the bottom of the header table (having longest branches), by finding all instances matching given condition. |
|||
New tree is created, with counts projected from the original tree corresponding to the set of instances that are conditional on the attribute, with each node getting sum of its children counts. |
|||
Recursive growth ends when no individual items conditional on the attribute meet minimum support threshold, and processing continues on the remaining header items of the original FP-tree. |
|||
In the first pass, the algorithm counts the occurrences of items (attribute-value pairs) in the dataset of transactions, and stores these counts in a 'header table'. In the second pass, it builds the FP-tree structure by inserting transactions into a [[trie]]. |
|||
Once the recursive process has completed, all large item sets with minimum coverage have been found, and association rule creation begins.<ref>Witten, Frank, Hall: Data mining practical machine learning tools and techniques, 3rd edition</ref> |
|||
Items in each transaction have to be sorted by descending order of their frequency in the dataset before being inserted so that the tree can be processed quickly. |
|||
=== Others === |
|||
Items in each transaction that do not meet the minimum support requirement are discarded. |
|||
If many transactions share most frequent items, the FP-tree provides high compression close to tree root. |
|||
Recursive processing of this compressed version of the main dataset grows frequent item sets directly, instead of generating candidate items and testing them against the entire database (as in the apriori algorithm). |
|||
==== AprioriDP ==== |
|||
AprioriDP<ref name="dharmesh2013" /> utilizes the Dynamic Programming concept in Frequent itemset mining. The working principle is to eliminate the candidate generation like FP-tree, but it store support count in specialized data structure instead of tree. |
|||
Growth begins from the bottom of the header table i.e. the item with the smallest support by finding all sorted transactions that end in that item. Call this item <math>I</math>. |
|||
==== Context Based Association Rule Mining Algorithm ==== |
|||
{{Main|Context Based Association Rules}} |
|||
A new conditional tree is created which is the original FP-tree projected onto <math>I</math>. The supports of all nodes in the projected tree are re-counted with each node getting the sum of its children counts. Nodes (and hence subtrees) that do not meet the minimum support are pruned. Recursive growth ends when no individual items conditional on <math>I</math> meet the minimum support threshold. The resulting paths from root to <math>I</math> will be frequent itemsets. After this step, processing continues with the next least-supported header item of the original FP-tree. |
|||
[http://www.sciencedirect.com/science/article/pii/S0950705112002237 CBPNARM] is the newly developed algorithm which is developed in 2013 to mine association rules on the basis of context. It uses context variable on the basis of which the support of an itemset is changed on the basis of which the rules are finally populated to the rule set. |
|||
Once the recursive process has completed, all frequent item sets will have been found, and association rule creation begins.<ref>Witten, Frank, Hall: Data mining practical machine learning tools and techniques, 3rd edition{{page needed|date=January 2019}}</ref> |
|||
==== Node-set-based algorithms ==== |
|||
FIN,<ref name="deng2014" /> PrePost <ref name="deng2012" /> and PPV <ref name="deng2010" /> are three algorithms based on node sets. They use nodes in a coding FP-tree to represent itemsets, and employ a depth-first search strategy to discovery frequent itemsets using "intersection" of node sets. |
|||
=== Others === |
|||
==== ASSOC ==== |
|||
[[GUHA]] is a general method for exploratory data analysis that has theoretical foundations in [[observational calculi]].<ref name="ObservationalCalculi">Rauch, Jan; ''Logical calculi for knowledge discovery in databases'', in ''Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery'', Springer, 1997, pp. 47-57</ref> |
|||
The ASSOC procedure<ref>{{cite book |last=Hájek |first=Petr |author2=Havránek, Tomáš |title=Mechanizing Hypothesis Formation: Mathematical Foundations for a General Theory |publisher=Springer-Verlag |year=1978 |isbn=3-540-08738- |
The ASSOC procedure<ref>{{cite book |last=Hájek |first=Petr |author2=Havránek, Tomáš |title=Mechanizing Hypothesis Formation: Mathematical Foundations for a General Theory |publisher=Springer-Verlag |year=1978 |isbn=978-3-540-08738-0 |url=http://www.cs.cas.cz/hajek/guhabook/ }}</ref> is a GUHA method which mines for generalized association rules using fast [[bitstring]]s operations. The association rules mined by this method are more general than those output by apriori, for example "items" can be connected both with conjunction and disjunctions and the relation between antecedent and consequent of the rule is not restricted to setting minimum support and confidence as in apriori: an arbitrary combination of supported interest measures can be used. |
||
==== OPUS search ==== |
==== OPUS search ==== |
||
OPUS is an efficient algorithm for rule discovery |
OPUS is an efficient algorithm for rule discovery that, in contrast to most alternatives, does not require either monotone or anti-monotone constraints such as minimum support.<ref name=OPUS>Webb, Geoffrey I. (1995); ''OPUS: An Efficient Admissible Algorithm for Unordered Search'', Journal of Artificial Intelligence Research 3, Menlo Park, CA: AAAI Press, pp. 431-465 [http://webarchive.loc.gov/all/20011118141304/http://www.cs.washington.edu/research/jair/abstracts/webb95a.html online access]</ref> Initially used to find rules for a fixed consequent<ref name="OPUS" /><ref name="Bayardo">{{Cite journal |doi=10.1023/A:1009895914772 |last1=Bayardo |first1=Roberto J. Jr. |last2=Agrawal |first2=Rakesh |last3=Gunopulos |first3=Dimitrios |year=2000 |title=Constraint-based rule mining in large, dense databases |journal=Data Mining and Knowledge Discovery |volume=4 |issue=2 |pages=217–240 |s2cid=5120441 }}</ref> it has subsequently been extended to find rules with any item as a consequent.<ref name="webb">{{cite book |doi=10.1145/347090.347112 |chapter=Efficient search for association rules |title=Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '00 |pages=99–107 |year=2000 |last1=Webb |first1=Geoffrey I. |isbn=978-1581132335 |citeseerx=10.1.1.33.1309 |s2cid=5444097 }}</ref> OPUS search is the core technology in the popular Magnum Opus association discovery system. |
||
== Lore == |
== Lore == |
||
A famous story about association rule mining is the "beer and diaper" story. |
A famous story about association rule mining is the "beer and diaper" story. A purported survey of behavior of supermarket shoppers discovered that customers (presumably young men) who buy diapers tend also to buy beer. This anecdote became popular as an example of how unexpected association rules might be found from everyday data. There are varying opinions as to how much of the story is true.<ref name="dss">{{Cite web | url=http://www.dssresources.com/newsletters/66.php | title=DSS News: Vol. 3, No. 23}}</ref> Daniel Powers says:<ref name="dss" /> |
||
<blockquote>In 1992, Thomas Blischok, manager of a retail consulting group at Teradata, and his staff prepared an analysis of 1.2 million market baskets from about 25 Osco Drug stores. Database queries were developed to identify affinities. The analysis "did discover that between 5:00 and 7:00 p.m. that consumers bought beer and diapers". Osco managers did NOT exploit the beer and diapers relationship by moving the products closer together on the shelves.</blockquote> |
<blockquote>In 1992, Thomas Blischok, manager of a retail consulting group at [[Teradata]], and his staff prepared an analysis of 1.2 million market baskets from about 25 Osco Drug stores. Database queries were developed to identify affinities. The analysis "did discover that between 5:00 and 7:00 p.m. that consumers bought beer and diapers". Osco managers did NOT exploit the beer and diapers relationship by moving the products closer together on the shelves.</blockquote> |
||
== Other types of association mining == |
== Other types of association rule mining == |
||
'''Multi-Relation Association Rules (MRAR)''': These are association rules where each item may have several relations. These relations indicate indirect relationships between the entities. Consider the following MRAR where the first item consists of three relations ''live in'', ''nearby'' and ''humid'': “Those who ''live in'' a place which is ''nearby'' a city with ''humid'' climate type and also are ''younger'' than 20 <math>\implies</math> their ''health condition'' is good”. Such association rules can be extracted from RDBMS data or semantic web data.<ref name="MRAR: Mining Multi-Relation Association Rules">Ramezani, Reza, Mohamad Saraee, and Mohammad Ali Nematbakhsh; ''MRAR: Mining Multi-Relation Association Rules'', Journal of Computing and Security, 1, no. 2 (2014)</ref> |
|||
'''[[Context Based Association Rules]]''' is a form of association rule. '''Context Based Association Rules''' claims more accuracy in association rule mining by considering a hidden variable named context variable which changes the final set of association rules depending upon the value of context variables. For example the baskets orientation in market basket analysis reflects an odd pattern in the early days of month.This might be because of abnormal context i.e. salary is drawn at the start of the month <ref name="Context Based Positive and Negative Spatio Temporal Association Rule Mining">Shaheen, M; Shahbaz, M; and Guergachi, A; ''Context Based Positive and Negative Spatio Temporal Association Rule Mining'', Elsvier Knowledge-Based Systems, Jan 2013, pp. 261-273</ref> |
|||
'''[[Contrast set learning]]''' is a form of associative learning. '''Contrast set learners''' use rules that differ meaningfully in their distribution across subsets.<ref name=" |
'''[[Contrast set learning]]''' is a form of associative learning. '''Contrast set learners''' use rules that differ meaningfully in their distribution across subsets.<ref name="webb03">{{cite conference |
||
| author = GI Webb and S. Butler and D. Newlands |
|||
| year = 2003 |
|||
| title = On Detecting Differences Between Groups |
|||
| conference = KDD'03 Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
|||
| url= http://portal.acm.org/citation.cfm?id=956781 |
|||
}} |
|||
</ref><ref name="busy">{{cite journal |doi=10.1109/MC.2003.1244531 |title=Computing practices - Data mining for very busy people |journal=Computer |volume=36 |issue=11 |pages=22–29 |year=2003 |last1=Menzies |first1=T. |last2=Ying Hu }}</ref> |
|||
'''Weighted class learning''' is another form of associative learning |
'''Weighted class learning''' is another form of associative learning where weights may be assigned to classes to give focus to a particular issue of concern for the consumer of the data mining results. |
||
'''High-order pattern discovery''' |
'''High-order pattern discovery''' facilitates the capture of high-order (polythetic) patterns or event associations that are intrinsic to complex real-world data. |
||
<ref name="discovere">{{cite journal | |
<ref name="discovere">{{cite journal |doi=10.1109/69.649314 |title=High-order pattern discovery from discrete-valued data |journal=IEEE Transactions on Knowledge and Data Engineering |volume=9 |issue=6 |pages=877–893 |year=1997 |last1=Wong |first1=A.K.C. |last2=Yang Wang |citeseerx=10.1.1.189.1704 }}</ref> |
||
'''[[K-optimal pattern discovery]]''' provides an alternative to the standard approach to association rule learning |
'''[[K-optimal pattern discovery]]''' provides an alternative to the standard approach to association rule learning which requires that each pattern appear frequently in the data. |
||
'''Approximate Frequent Itemset''' mining is a relaxed version of Frequent Itemset mining that allows some of the items in some of the rows to be 0.<ref>{{cite book |doi=10.1137/1.9781611972764.36 |chapter=Mining Approximate Frequent Itemsets in the Presence of Noise: Algorithm and Analysis |title=Proceedings of the 2006 SIAM International Conference on Data Mining |pages=407–418 |year=2006 |last1=Liu |first1=Jinze |last2=Paulsen |first2=Susan |last3=Sun |first3=Xing |last4=Wang |first4=Wei |last5=Nobel |first5=Andrew |last6=Prins |first6=Jan |isbn=978-0-89871-611-5 |citeseerx=10.1.1.215.3599 }}</ref> |
|||
'''Generalized Association Rules''' hierarchical taxonomy (concept hierarchy) |
'''Generalized Association Rules''' hierarchical taxonomy (concept hierarchy) |
||
'''Quantitative Association Rules''' categorical and quantitative data |
'''Quantitative Association Rules''' categorical and quantitative data |
||
<ref name="quantminer">{{cite journal |last=Salleb-Aouissi |first=Ansaf |author2=Vrain, Christel|author3= Nortet, Cyril |title=QuantMiner: A Genetic Algorithm for Mining Quantitative Association Rules |journal=International Joint Conference on Artificial Intelligence (IJCAI) |year=2007 |pages=1035–1040 }}</ref> |
|||
'''Interval Data Association Rules''' e.g. partition the age into 5-year-increment ranged |
'''Interval Data Association Rules''' e.g. partition the age into 5-year-increment ranged |
||
'''[[Sequential pattern mining]] ''' discovers subsequences that are common to more than minsup (minimum support threshold) sequences in a sequence database, where minsup is set by the user. A sequence is an ordered list of transactions.<ref name="sequence">Zaki, Mohammed J. (2001); ''SPADE: An Efficient Algorithm for Mining Frequent Sequences'', Machine Learning Journal, 42, pp. 31–60</ref> |
|||
'''Maximal Association Rules''' |
|||
'''Subspace Clustering''', a specific type of [[clustering high-dimensional data]], is in many variants also based on the downward-closure property for specific clustering models.<ref name="ZimekAssent2014">{{cite book|last1=Zimek|first1=Arthur|title=Frequent Pattern Mining|last2=Assent|first2=Ira|last3=Vreeken|first3=Jilles|year=2014|pages=403–423|doi=10.1007/978-3-319-07821-2_16|isbn=978-3-319-07820-5}}</ref> |
|||
'''Sequential pattern mining ''' discovers subsequences that are common to more than minsup sequences in a sequence database, where minsup is set by the user. A sequence is an ordered list of transactions.<ref name="sequence">Zaki, Mohammed J. (2001); ''SPADE: An Efficient Algorithm for Mining Frequent Sequences'', Machine Learning Journal, 42, pp. 31–60</ref> |
|||
'''Warmr''', shipped as part of the ACE data mining suite, allows association rule learning for first order relational rules.<ref>{{cite journal | pmid = 11272703 | volume=15 | issue=2 | title=Warmr: a data mining tool for chemical data. | date=Feb 2001 | journal=J Comput Aided Mol Des | pages=173–81| last1=King | first1=R. D. | last2=Srinivasan | first2=A. | last3=Dehaspe | first3=L. | bibcode=2001JCAMD..15..173K | doi=10.1023/A:1008171016861 | s2cid=3055046 }}</ref> |
|||
'''Sequential Rules''' discovering relationships between items while considering the time ordering. It is generally applied on a sequence database. For example, a sequential rule found in database of sequences of customer transactions can be that customers who bought a computer and CD-Roms, later bought a webcam, with a given confidence and support. |
|||
'''Warmr '''is shipped as part of the ACE data mining suite. It allows association rule learning for first order relational rules.<ref>http://www.ncbi.nlm.nih.gov/pubmed/11272703</ref> |
|||
==See also== |
==See also== |
||
* [[Sequence mining]] |
* [[Sequence mining]] |
||
* [[Production system]] |
* [[Production system (computer science)]] |
||
* [[Learning classifier system]] |
|||
* [[Rule-based machine learning]] |
|||
==References== |
==References== |
||
{{reflist| |
{{reflist|2}} |
||
<ref name="deng2014">Z. H. Deng and S. L. Lv. Fast mining frequent itemsets using Nodesets.[http://www.sciencedirect.com/science/article/pii/S0957417414000463]. Expert Systems with Applications, 41(10): 4505–4512, 2014.</ref> |
|||
<ref name="deng2012">Z. H. Deng, Z. Wang,and J. Jiang. A New Algorithm for Fast Mining Frequent Itemsets Using N-Lists [http://info.scichina.com:8084/sciFe/EN/abstract/abstract508369.shtml]. SCIENCE CHINA Information Sciences, 55 (9): 2008 - 2030, 2012.</ref> |
|||
<ref name="deng2010">Z. H. Deng and Z. Wang. A New Fast Vertical Method for Mining Frequent Patterns [http://www.tandfonline.com/doi/abs/10.1080/18756891.2010.9727736]. International Journal of Computational Intelligence Systems, 3(6): 733 - 744, 2010.</ref> |
|||
<ref name="dharmesh2013">D. Bhalodiya, K. M. Patel and C. Patel. An Efficient way to Find Frequent Pattern with Dynamic Programming Approach [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6780102&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6780102]. NIRMA UNIVERSITY INTERNATIONAL CONFERENCE ON ENGINEERING, NUiCONE-2013, 28-30 NOVEMBER, 2013.</ref> |
|||
}} |
|||
==External links== |
|||
===Bibliographies=== |
===Bibliographies=== |
||
* |
* [http://michael.hahsler.net/research/bib/association_rules/ Annotated Bibliography on Association Rules] {{Webarchive|url=https://web.archive.org/web/20170219091753/http://michael.hahsler.net/research/bib/association_rules/ |date=2017-02-19 }} by M. Hahsler |
||
* [http://www.statsoft.com/textbook/association-rules/ Statsoft Electronic Statistics Textbook: Association Rules] |
|||
{{Prone to spam|date=February 2016}} |
|||
<!-- {{No more links}} |
|||
Please be cautious adding more external links. |
|||
Wikipedia is not a collection of links and should not be used for advertising. |
|||
Excessive or inappropriate links will be removed. |
|||
See [[Wikipedia:External links]] and [[Wikipedia:Spam]] for details. |
|||
If there are already suitable links, propose additions or replacements on |
|||
the article's talk page, or submit your link to the relevant category at |
|||
DMOZ (dmoz.org) and link there using {{Dmoz}}. |
|||
--> |
|||
===Implementations=== |
|||
* [http://eric.univ-lyon2.fr/~ricco/sipina.html SIPINA], a free, academic data mining software which includes a model for association rule learning. |
|||
* [http://www.pervasivedatarush.com Pervasive DataRush], data mining platform for big data, includes association rule mining |
|||
* [http://www.KXEN.com KXEN, a commercial Data Mining software] |
|||
* [http://codeding.com/?article=13 Silverlight widget for live demonstration of association rule mining using Apriori algorithm] |
|||
* [[RapidMiner]], a free Java data mining software suite (Community Edition: GNU) |
|||
* [[Orange (software)|Orange]], a free data mining software suite, module [http://www.ailab.si/orange/doc/modules/orngAssoc.htm orngAssoc] |
|||
* [http://ai4r.org Ruby implementation (AI4R)] |
|||
* [http://cran.r-project.org/package=arules arules], a package for mining association rules and frequent itemsets with [[R (programming language)|R]] |
|||
* [http://www.borgelt.net/fpm.html Christian Borgelt's implementation of Apriori, FP-Growth and Eclat] |
|||
* [http://fimi.cs.helsinki.fi/ Frequent Itemset Mining Implementations Repository (FIMI)] |
|||
* [http://adrem.ua.ac.be/~goethals/software/ Frequent pattern mining implementations from Bart Goethals] |
|||
* [http://www.cs.waikato.ac.nz/ml/weka/ Weka], a collection of machine learning algorithms for data mining tasks written in [[Java (programming language)|Java]] |
|||
* [[KNIME]] an open source workflow oriented data preprocessing and analysis platform |
|||
* Zaki, Mohammed J.; [http://www.cs.rpi.edu/~zaki/software/ Data Mining Software] |
|||
* [http://www.giwebb.com Magnum Opus], a system for statistically sound association discovery |
|||
* [http://lispminer.vse.cz LISp Miner], mines for generalized (GUHA) association rules (uses bitstrings, not apriori algorithm) |
|||
* [http://ferda.sourceforge.net Ferda Dataminer], an extensible visual data mining platform, implements GUHA procedures ASSOC and features multirelational data mining |
|||
* [http://www.statsoft.com STATISTICA], commercial statistics software with an Association Rules module |
|||
* [http://www.philippe-fournier-viger.com/spmf/ SPMF], an open-source data mining platform offering more than 48 algorithms for association rule mining, itemset mining and sequential pattern mining. Includes a simple user interface and java source code is distributed under the GPL. |
|||
* [http://www.cs.umb.edu/~laur/ARtool/ ARtool], GPL Java association rule mining application with GUI, offering implementations of multiple algorithms for discovery of frequent patterns and extraction of association rules (includes Apriori and FPgrowth) |
|||
* [http://easyminer.eu EasyMiner], a web-based association rule mining system for interactive mining. Free demo. Based on [http://lispminer.vse.cz LISp Miner] |
|||
{{DEFAULTSORT:Association Rule Learning}} |
{{DEFAULTSORT:Association Rule Learning}} |
Latest revision as of 22:16, 28 November 2024
Part of a series on |
Machine learning and data mining |
---|
Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.[1] In any given transaction with a variety of items, association rules are meant to discover the rules that determine how or why certain items are connected.
Based on the concept of strong rules, Rakesh Agrawal, Tomasz Imieliński and Arun Swami[2] introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements.
In addition to the above example from market basket analysis, association rules are employed today in many application areas including Web usage mining, intrusion detection, continuous production, and bioinformatics. In contrast with sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions.
The association rule algorithm itself consists of various parameters that can make it difficult for those without some expertise in data mining to execute, with many rules that are arduous to understand.[3]
Definition
[edit]Following the original definition by Agrawal, Imieliński, Swami[2] the problem of association rule mining is defined as:
Let be a set of n binary attributes called items.
Let be a set of transactions called the database.
Each transaction in D has a unique transaction ID and contains a subset of the items in I.
A rule is defined as an implication of the form:
- , where .
In Agrawal, Imieliński, Swami[2] a rule is defined only between a set and a single item, for .
Every rule is composed by two different sets of items, also known as itemsets, X and Y, where X is called antecedent or left-hand-side (LHS) and Y consequent or right-hand-side (RHS). The antecedent is that item that can be found in the data while the consequent is the item found when combined with the antecedent. The statement is often read as if X then Y, where the antecedent (X ) is the if and the consequent (Y) is the then. This simply implies that, in theory, whenever X occurs in a dataset, then Y will as well.
Process
[edit]Association rules are made by searching data for frequent if-then patterns and by using a certain criterion under Support and Confidence to define what the most important relationships are. Support is the evidence of how frequent an item appears in the data given, as Confidence is defined by how many times the if-then statements are found true. However, there is a third criteria that can be used, it is called Lift and it can be used to compare the expected Confidence and the actual Confidence. Lift will show how many times the if-then statement is expected to be found to be true.
Association rules are made to calculate from itemsets, which are created by two or more items. If the rules were built from the analyzing from all the possible itemsets from the data then there would be so many rules that they wouldn’t have any meaning. That is why Association rules are typically made from rules that are well represented by the data.
There are many different data mining techniques you could use to find certain analytics and results, for example, there is Classification analysis, Clustering analysis, and Regression analysis.[4] What technique you should use depends on what you are looking for with your data. Association rules are primarily used to find analytics and a prediction of customer behavior. For Classification analysis, it would most likely be used to question, make decisions, and predict behavior.[5] Clustering analysis is primarily used when there are no assumptions made about the likely relationships within the data.[5] Regression analysis Is used when you want to predict the value of a continuous dependent from a number of independent variables.[5]
Benefits
There are many benefits of using Association rules like finding the pattern that helps understand the correlations and co-occurrences between data sets. A very good real-world example that uses Association rules would be medicine. Medicine uses Association rules to help diagnose patients. When diagnosing patients there are many variables to consider as many diseases will share similar symptoms. With the use of the Association rules, doctors can determine the conditional probability of an illness by comparing symptom relationships from past cases.[6]
Downsides
However, Association rules also lead to many different downsides such as finding the appropriate parameter and threshold settings for the mining algorithm. But there is also the downside of having a large number of discovered rules. The reason is that this does not guarantee that the rules will be found relevant, but it could also cause the algorithm to have low performance. Sometimes the implemented algorithms will contain too many variables and parameters. For someone that doesn’t have a good concept of data mining, this might cause them to have trouble understanding it.[7]
Thresholds
When using Association rules, you are most likely to only use Support and Confidence. However, this means you have to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Usually, the Association rule generation is split into two different steps that needs to be applied:
- A minimum Support threshold to find all the frequent itemsets that are in the database.
- A minimum Confidence threshold to the frequent itemsets found to create rules.
Items | Support | Confidence | Items | Support | Confidence | |
---|---|---|---|---|---|---|
Item A | 30% | 50% | Item C | 45% | 55% | |
Item B | 15% | 25% | Item A | 30% | 50% | |
Item C | 45% | 55% | Item D | 35% | 40% | |
Item D | 35% | 40% | Item B | 15% | 25% |
The Support Threshold is 30%, Confidence Threshold is 50%
The Table on the left is the original unorganized data and the table on the right is organized by the thresholds. In this case Item C is better than the thresholds for both Support and Confidence which is why it is first. Item A is second because its threshold values are spot on. Item D has met the threshold for Support but not Confidence. Item B has not met the threshold for either Support or Confidence and that is why it is last.
To find all the frequent itemsets in a database is not an easy task since it involves going through all the data to find all possible item combinations from all possible itemsets. The set of possible itemsets is the power set over I and has size , of course this means to exclude the empty set which is not considered to be a valid itemset. However, the size of the power set will grow exponentially in the number of item n that is within the power set I. An efficient search is possible by using the downward-closure property of support[2][8] (also called anti-monotonicity[9]). This would guarantee that a frequent itemset and all its subsets are also frequent and thus will have no infrequent itemsets as a subset of a frequent itemset. Exploiting this property, efficient algorithms (e.g., Apriori[10] and Eclat[11]) can find all frequent itemsets.
Useful Concepts
[edit]transaction ID | milk | bread | butter | beer | diapers | eggs | fruit |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 |
2 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
3 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
4 | 1 | 1 | 1 | 0 | 0 | 1 | 1 |
5 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
To illustrate the concepts, we use a small example from the supermarket domain. Table 2 shows a small database containing the items where, in each entry, the value 1 means the presence of the item in the corresponding transaction, and the value 0 represents the absence of an item in that transaction. The set of items is .
An example rule for the supermarket could be meaning that if butter and bread are bought, customers also buy milk.
In order to select interesting rules from the set of all possible rules, constraints on various measures of significance and interest are used. The best-known constraints are minimum thresholds on support and confidence.
Let be itemsets, an association rule and T a set of transactions of a given database.
Note: this example is extremely small. In practical applications, a rule needs a support of several hundred transactions before it can be considered statistically significant,[citation needed] and datasets often contain thousands or millions of transactions.
Support
[edit]Support is an indication of how frequently the itemset appears in the dataset.
In our example, it can be easier to explain support by writing [12] where A and B are separate item sets that occur at the same time in a transaction.
Using Table 2 as an example, the itemset has a support of 1/5=0.2 since it occurs in 20% of all transactions (1 out of 5 transactions). The argument of support of X is a set of preconditions, and thus becomes more restrictive as it grows (instead of more inclusive).[13]
Furthermore, the itemset has a support of 1/5=0.2 as it appears in 20% of all transactions as well.
When using antecedents and consequents, it allows a data miner to determine the support of multiple items being bought together in comparison to the whole data set. For example, Table 2 shows that if milk is bought, then bread is bought has a support of 0.4 or 40%. This because in 2 out 5 of the transactions, milk as well as bread are bought. In smaller data sets like this example, it is harder to see a strong correlation when there are few samples, but when the data set grows larger, support can be used to find correlation between two or more products in the supermarket example.
Minimum support thresholds are useful for determining which itemsets are preferred or interesting.
If we set the support threshold to ≥0.4 in Table 3, then the would be removed since it did not meet the minimum threshold of 0.4. Minimum threshold is used to remove samples where there is not a strong enough support or confidence to deem the sample as important or interesting in the dataset.
Another way of finding interesting samples is to find the value of (support)×(confidence); this allows a data miner to see the samples where support and confidence are high enough to be highlighted in the dataset and prompt a closer look at the sample to find more information on the connection between the items.
Support can be beneficial for finding the connection between products in comparison to the whole dataset, whereas confidence looks at the connection between one or more items and another item. Below is a table that shows the comparison and contrast between support and support × confidence, using the information from Table 4 to derive the confidence values.
if Antecedent then Consequent | support | support X confidence |
---|---|---|
if buy milk, then buy bread | 2/5= 0.4 | 0.4×1.0= 0.4 |
if buy milk, then buy eggs | 1/5= 0.2 | 0.2×0.5= 0.1 |
if buy bread, then buy fruit | 2/5= 0.4 | 0.4×0.66= 0.264 |
if buy fruit, then buy eggs | 2/5= 0.4 | 0.4×0.66= 0.264 |
if buy milk and bread, then buy fruit | 2/5= 0.4 | 0.4×1.0= 0.4 |
The support of X with respect to T is defined as the proportion of transactions in the dataset which contains the itemset X. Denoting a transaction by where i is the unique identifier of the transaction and t is its itemset, the support may be written as:
This notation can be used when defining more complicated datasets where the items and itemsets may not be as easy as our supermarket example above. Other examples of where support can be used is in finding groups of genetic mutations that work collectively to cause a disease, investigating the number of subscribers that respond to upgrade offers, and discovering which products in a drug store are never bought together.[12]
Confidence
[edit]Confidence is the percentage of all transactions satisfying X that also satisfy Y.[14]
With respect to T, the confidence value of an association rule, often denoted as , is the ratio of transactions containing both X and Y to the total amount of X values present, where X is the antecedent and Y is the consequent.
Confidence can also be interpreted as an estimate of the conditional probability , the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS.[13][15]
It is commonly depicted as:
The equation illustrates that confidence can be computed by calculating the co-occurrence of transactions X and Y within the dataset in ratio to transactions containing only X. This means that the number of transactions in both X and Y is divided by those just in X .
For example, Table 2 shows the rule which has a confidence of in the dataset, which denotes that every time a customer buys butter and bread, they also buy milk. This particular example demonstrates the rule being correct 100% of the time for transactions containing both butter and bread. The rule , however, has a confidence of . This suggests that eggs are bought 67% of the times that fruit is brought. Within this particular dataset, fruit is purchased a total of 3 times, with two of those times consisting of egg purchases.
For larger datasets, a minimum threshold, or a percentage cutoff, for the confidence can be useful for determining item relationships. When applying this method to some of the data in Table 2, information that does not meet the requirements are removed. Table 4 shows association rule examples where the minimum threshold for confidence is 0.5 (50%). Any data that does not have a confidence of at least 0.5 is omitted. Generating thresholds allow for the association between items to become stronger as the data is further researched by emphasizing those that co-occur the most. The table uses the confidence information from Table 3 to implement the Support × Confidence column, where the relationship between items via their both confidence and support, instead of just one concept, is highlighted. Ranking the rules by Support × Confidence multiples the confidence of a particular rule to its support and is often implemented for a more in-depth understanding of the relationship between the items.
if Antecedent then Consequent | Confidence | Support × Confidence |
---|---|---|
if buy milk, then buy bread | 2⁄2 = 1.0 | 0.4×1.0= 0.4 |
if buy milk, then buy eggs | 1⁄2 = 0.5 | 0.2×0.5= 0.1 |
if buy bread, then buy fruit | 2⁄3 ≈ 0.66 | 0.4×0.66= 0.264 |
if buy fruit, then buy eggs | 2⁄3 ≈ 0.66 | 0.4×0.66= 0.264 |
if buy milk and bread, then buy fruit | 2⁄2 = 1.0 | 0.4×1.0= 0.4 |
Overall, using confidence in association rule mining is great way to bring awareness to data relations. Its greatest benefit is highlighting the relationship between particular items to one another within the set, as it compares co-occurrences of items to the total occurrence of the antecedent in the specific rule. However, confidence is not the optimal method for every concept in association rule mining. The disadvantage of using it is that it does not offer multiple difference outlooks on the associations. Unlike support, for instance, confidence does not provide the perspective of relationships between certain items in comparison to the entire dataset, so while milk and bread, for example, may occur 100% of the time for confidence, it only has a support of 0.4 (40%). This is why it is important to look at other viewpoints, such as Support × Confidence, instead of solely relying on one concept incessantly to define the relationships.
Lift
[edit]The lift of a rule is defined as:
or the ratio of the observed support to that expected if X and Y were independent.
For example, the rule has a lift of .
If the rule had a lift of 1, it would imply that the probability of occurrence of the antecedent and that of the consequent are independent of each other. When two events are independent of each other, no rule can be drawn involving those two events.
If the lift is > 1, that lets us know the degree to which those two occurrences are dependent on one another, and makes those rules potentially useful for predicting the consequent in future data sets.
If the lift is < 1, that lets us know the items are substitute to each other. This means that presence of one item has negative effect on presence of other item and vice versa.
The value of lift is that it considers both the support of the rule and the overall data set.[13]
[rede]
Conviction
[edit]The conviction of a rule is defined as .[16]
For example, the rule has a conviction of , and can be interpreted as the ratio of the expected frequency that X occurs without Y (that is to say, the frequency that the rule makes an incorrect prediction) if X and Y were independent divided by the observed frequency of incorrect predictions. In this example, the conviction value of 1.2 shows that the rule would be incorrect 20% more often (1.2 times as often) if the association between X and Y was purely random chance.
Alternative measures of interestingness
[edit]In addition to confidence, other measures of interestingness for rules have been proposed. Some popular measures are:
Several more measures are presented and compared by Tan et al.[20] and by Hahsler.[21] Looking for techniques that can model what the user has known (and using these models as interestingness measures) is currently an active research trend under the name of "Subjective Interestingness."
History
[edit]The concept of association rules was popularized particularly due to the 1993 article of Agrawal et al.,[2] which has acquired more than 23,790 citations according to Google Scholar, as of April 2021, and is thus one of the most cited papers in the Data Mining field. However, what is now called "association rules" is introduced already in the 1966 paper[22] on GUHA, a general data mining method developed by Petr Hájek et al.[23]
An early (circa 1989) use of minimum support and confidence to find all association rules is the Feature Based Modeling framework, which found all rules with and greater than user defined constraints.[24]
Statistically sound associations
[edit]One limitation of the standard approach to discovering associations is that by searching massive numbers of possible associations to look for collections of items that appear to be associated, there is a large risk of finding many spurious associations. These are collections of items that co-occur with unexpected frequency in the data, but only do so by chance. For example, suppose we are considering a collection of 10,000 items and looking for rules containing two items in the left-hand-side and 1 item in the right-hand-side. There are approximately 1,000,000,000,000 such rules. If we apply a statistical test for independence with a significance level of 0.05 it means there is only a 5% chance of accepting a rule if there is no association. If we assume there are no associations, we should nonetheless expect to find 50,000,000,000 rules. Statistically sound association discovery[25][26] controls this risk, in most cases reducing the risk of finding any spurious associations to a user-specified significance level.
Algorithms
[edit]Many algorithms for generating association rules have been proposed.
Some well-known algorithms are Apriori, Eclat and FP-Growth, but they only do half the job, since they are algorithms for mining frequent itemsets. Another step needs to be done after to generate rules from frequent itemsets found in a database.
Apriori algorithm
[edit]Apriori is given by R. Agrawal and R. Srikant in 1994 for frequent item set mining and association rule learning. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often. The name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties.
Overview: Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. Apriori uses breadth-first search and a Hash tree structure to count candidate item sets efficiently. It generates candidate item sets of length from item sets of length . Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequent -length item sets. After that, it scans the transaction database to determine frequent item sets among the candidates.
Example: Assume that each row is a cancer sample with a certain combination of mutations labeled by a character in the alphabet. For example a row could have {a, c} which means it is affected by mutation 'a' and mutation 'c'.
{a, b} | {c, d} | {a, d} | {a, e} | {b, d} | {a, b, d} | {a, c, d} | {a, b, c, d} |
---|
Now we will generate the frequent item set by counting the number of occurrences of each character. This is also known as finding the support values. Then we will prune the item set by picking a minimum support threshold. For this pass of the algorithm we will pick 3.
a | b | c | d |
---|---|---|---|
6 | 4 | 3 | 6 |
Since all support values are three or above there is no pruning. The frequent item set is {a}, {b}, {c}, and {d}. After this we will repeat the process by counting pairs of mutations in the input set.
{a, b} | {a, c} | {a, d} | {b, c} | {b, d} | {c, d} |
---|---|---|---|---|---|
3 | 2 | 4 | 1 | 3 | 3 |
Now we will make our minimum support value 4 so only {a, d} will remain after pruning. Now we will use the frequent item set to make combinations of triplets. We will then repeat the process by counting occurrences of triplets of mutations in the input set.
{a, c, d} |
---|
2 |
Since we only have one item the next set of combinations of quadruplets is empty so the algorithm will stop.
Advantages and Limitations:
Apriori has some limitations. Candidate generation can result in large candidate sets. For example a 10^4 frequent 1-itemset will generate a 10^7 candidate 2-itemset. The algorithm also needs to frequently scan the database, to be specific n+1 scans where n is the length of the longest pattern. Apriori is slower than the Eclat algorithm. However, Apriori performs well compared to Eclat when the dataset is large. This is because in the Eclat algorithm if the dataset is too large the tid-lists become too large for memory. FP-growth outperforms the Apriori and Eclat. This is due to the FP-growth algorithm not having candidate generation or test, using a compact data structure, and only having one database scan.[27]
Eclat algorithm
[edit]Eclat[11] (alt. ECLAT, stands for Equivalence Class Transformation) is a backtracking algorithm, which traverses the frequent itemset lattice graph in a depth-first search (DFS) fashion. Whereas the breadth-first search (BFS) traversal used in the Apriori algorithm will end up checking every subset of an itemset before checking it, DFS traversal checks larger itemsets and can save on checking the support of some of its subsets by virtue of the downward-closer property. Furthermore it will almost certainly use less memory as DFS has a lower space complexity than BFS.
To illustrate this, let there be a frequent itemset {a, b, c}. a DFS may check the nodes in the frequent itemset lattice in the following order: {a} → {a, b} → {a, b, c}, at which point it is known that {b}, {c}, {a, c}, {b, c} all satisfy the support constraint by the downward-closure property. BFS would explore each subset of {a, b, c} before finally checking it. As the size of an itemset increases, the number of its subsets undergoes combinatorial explosion.
It is suitable for both sequential as well as parallel execution with locality-enhancing properties.[28][29]
FP-growth algorithm
[edit]FP stands for frequent pattern.[30]
In the first pass, the algorithm counts the occurrences of items (attribute-value pairs) in the dataset of transactions, and stores these counts in a 'header table'. In the second pass, it builds the FP-tree structure by inserting transactions into a trie.
Items in each transaction have to be sorted by descending order of their frequency in the dataset before being inserted so that the tree can be processed quickly. Items in each transaction that do not meet the minimum support requirement are discarded. If many transactions share most frequent items, the FP-tree provides high compression close to tree root.
Recursive processing of this compressed version of the main dataset grows frequent item sets directly, instead of generating candidate items and testing them against the entire database (as in the apriori algorithm).
Growth begins from the bottom of the header table i.e. the item with the smallest support by finding all sorted transactions that end in that item. Call this item .
A new conditional tree is created which is the original FP-tree projected onto . The supports of all nodes in the projected tree are re-counted with each node getting the sum of its children counts. Nodes (and hence subtrees) that do not meet the minimum support are pruned. Recursive growth ends when no individual items conditional on meet the minimum support threshold. The resulting paths from root to will be frequent itemsets. After this step, processing continues with the next least-supported header item of the original FP-tree.
Once the recursive process has completed, all frequent item sets will have been found, and association rule creation begins.[31]
Others
[edit]ASSOC
[edit]The ASSOC procedure[32] is a GUHA method which mines for generalized association rules using fast bitstrings operations. The association rules mined by this method are more general than those output by apriori, for example "items" can be connected both with conjunction and disjunctions and the relation between antecedent and consequent of the rule is not restricted to setting minimum support and confidence as in apriori: an arbitrary combination of supported interest measures can be used.
OPUS search
[edit]OPUS is an efficient algorithm for rule discovery that, in contrast to most alternatives, does not require either monotone or anti-monotone constraints such as minimum support.[33] Initially used to find rules for a fixed consequent[33][34] it has subsequently been extended to find rules with any item as a consequent.[35] OPUS search is the core technology in the popular Magnum Opus association discovery system.
Lore
[edit]A famous story about association rule mining is the "beer and diaper" story. A purported survey of behavior of supermarket shoppers discovered that customers (presumably young men) who buy diapers tend also to buy beer. This anecdote became popular as an example of how unexpected association rules might be found from everyday data. There are varying opinions as to how much of the story is true.[36] Daniel Powers says:[36]
In 1992, Thomas Blischok, manager of a retail consulting group at Teradata, and his staff prepared an analysis of 1.2 million market baskets from about 25 Osco Drug stores. Database queries were developed to identify affinities. The analysis "did discover that between 5:00 and 7:00 p.m. that consumers bought beer and diapers". Osco managers did NOT exploit the beer and diapers relationship by moving the products closer together on the shelves.
Other types of association rule mining
[edit]Multi-Relation Association Rules (MRAR): These are association rules where each item may have several relations. These relations indicate indirect relationships between the entities. Consider the following MRAR where the first item consists of three relations live in, nearby and humid: “Those who live in a place which is nearby a city with humid climate type and also are younger than 20 their health condition is good”. Such association rules can be extracted from RDBMS data or semantic web data.[37]
Contrast set learning is a form of associative learning. Contrast set learners use rules that differ meaningfully in their distribution across subsets.[38][39]
Weighted class learning is another form of associative learning where weights may be assigned to classes to give focus to a particular issue of concern for the consumer of the data mining results.
High-order pattern discovery facilitates the capture of high-order (polythetic) patterns or event associations that are intrinsic to complex real-world data. [40]
K-optimal pattern discovery provides an alternative to the standard approach to association rule learning which requires that each pattern appear frequently in the data.
Approximate Frequent Itemset mining is a relaxed version of Frequent Itemset mining that allows some of the items in some of the rows to be 0.[41]
Generalized Association Rules hierarchical taxonomy (concept hierarchy)
Quantitative Association Rules categorical and quantitative data
Interval Data Association Rules e.g. partition the age into 5-year-increment ranged
Sequential pattern mining discovers subsequences that are common to more than minsup (minimum support threshold) sequences in a sequence database, where minsup is set by the user. A sequence is an ordered list of transactions.[42]
Subspace Clustering, a specific type of clustering high-dimensional data, is in many variants also based on the downward-closure property for specific clustering models.[43]
Warmr, shipped as part of the ACE data mining suite, allows association rule learning for first order relational rules.[44]
See also
[edit]- Sequence mining
- Production system (computer science)
- Learning classifier system
- Rule-based machine learning
References
[edit]- ^ Piatetsky-Shapiro, Gregory (1991), Discovery, analysis, and presentation of strong rules, in Piatetsky-Shapiro, Gregory; and Frawley, William J.; eds., Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge, MA.
- ^ a b c d e f Agrawal, R.; Imieliński, T.; Swami, A. (1993). "Mining association rules between sets of items in large databases". Proceedings of the 1993 ACM SIGMOD international conference on Management of data - SIGMOD '93. p. 207. CiteSeerX 10.1.1.40.6984. doi:10.1145/170035.170072. ISBN 978-0897915922. S2CID 490415.
- ^ Garcia, Enrique (2007). "Drawbacks and solutions of applying association rule mining in learning management systems" (PDF). Sci2s. Archived (PDF) from the original on 2009-12-23.
- ^ "Data Mining Techniques: Top 5 to Consider". Precisely. 2021-11-08. Retrieved 2021-12-10.
- ^ a b c "16 Data Mining Techniques: The Complete List - Talend". Talend - A Leader in Data Integration & Data Integrity. Retrieved 2021-12-10.
- ^ "What are Association Rules in Data Mining (Association Rule Mining)?". SearchBusinessAnalytics. Retrieved 2021-12-10.
- ^ "Drawbacks and solutions of applying association rule mining in learning management systems". ResearchGate. Retrieved 2021-12-10.
- ^ Tan, Pang-Ning; Michael, Steinbach; Kumar, Vipin (2005). "Chapter 6. Association Analysis: Basic Concepts and Algorithms" (PDF). Introduction to Data Mining. Addison-Wesley. ISBN 978-0-321-32136-7.
- ^ Jian Pei; Jiawei Han; Lakshmanan, L.V.S. (2001). "Mining frequent itemsets with convertible constraints". Proceedings 17th International Conference on Data Engineering. pp. 433–442. CiteSeerX 10.1.1.205.2150. doi:10.1109/ICDE.2001.914856. ISBN 978-0-7695-1001-9. S2CID 1080975.
- ^ Agrawal, Rakesh; and Srikant, Ramakrishnan; Fast algorithms for mining association rules in large databases Archived 2015-02-25 at the Wayback Machine, in Bocca, Jorge B.; Jarke, Matthias; and Zaniolo, Carlo; editors, Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile, September 1994, pages 487-499
- ^ a b Zaki, M. J. (2000). "Scalable algorithms for association mining". IEEE Transactions on Knowledge and Data Engineering. 12 (3): 372–390. CiteSeerX 10.1.1.79.9448. doi:10.1109/69.846291.
- ^ a b Larose, Daniel T.; Larose, Chantal D. (2014-06-23). Discovering Knowledge in Data. doi:10.1002/9781118874059. ISBN 9781118874059.
- ^ a b c Hahsler, Michael (2005). "Introduction to arules – A computational environment for mining association rules and frequent item sets" (PDF). Journal of Statistical Software. doi:10.18637/jss.v014.i15. Archived from the original (PDF) on 2019-04-30. Retrieved 2016-03-18.
- ^ Wong, Pak (1999). "Visualizing Association Rules for Text Mining" (PDF). BSTU Laboratory of Artificial Neural Networks. Archived (PDF) from the original on 2021-11-29.
- ^ Hipp, J.; Güntzer, U.; Nakhaeizadeh, G. (2000). "Algorithms for association rule mining --- a general survey and comparison". ACM SIGKDD Explorations Newsletter. 2: 58–64. CiteSeerX 10.1.1.38.5305. doi:10.1145/360402.360421. S2CID 9248096.
- ^ Brin, Sergey; Motwani, Rajeev; Ullman, Jeffrey D.; Tsur, Shalom (1997). "Dynamic itemset counting and implication rules for market basket data". Proceedings of the 1997 ACM SIGMOD international conference on Management of data - SIGMOD '97. pp. 255–264. CiteSeerX 10.1.1.41.6476. doi:10.1145/253260.253325. ISBN 978-0897919111. S2CID 15385590.
- ^ Omiecinski, E.R. (2003). "Alternative interest measures for mining associations in databases". IEEE Transactions on Knowledge and Data Engineering. 15: 57–69. CiteSeerX 10.1.1.329.5344. doi:10.1109/TKDE.2003.1161582. S2CID 18364249.
- ^ Aggarwal, Charu C.; Yu, Philip S. (1998). "A new framework for itemset generation". Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems - PODS '98. pp. 18–24. CiteSeerX 10.1.1.24.714. doi:10.1145/275487.275490. ISBN 978-0897919968. S2CID 11934586.
- ^ Piatetsky-Shapiro, Gregory; Discovery, analysis, and presentation of strong rules, Knowledge Discovery in Databases, 1991, pp. 229-248
- ^ Tan, Pang-Ning; Kumar, Vipin; Srivastava, Jaideep (2004). "Selecting the right objective measure for association analysis". Information Systems. 29 (4): 293–313. CiteSeerX 10.1.1.331.4740. doi:10.1016/S0306-4379(03)00072-3.
- ^ Michael Hahsler (2015). A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules. https://mhahsler.github.io/arules/docs/measures
- ^ Hájek, P.; Havel, I.; Chytil, M. (1966). "The GUHA method of automatic hypotheses determination". Computing. 1 (4): 293–308. doi:10.1007/BF02345483. S2CID 10511114.
- ^ Hájek, Petr; Rauch, Jan; Coufal, David; Feglar, Tomáš (2004). "The GUHA Method, Data Preprocessing and Mining". Database Support for Data Mining Applications. Lecture Notes in Computer Science. Vol. 2682. pp. 135–153. doi:10.1007/978-3-540-44497-8_7. ISBN 978-3-540-22479-2.
- ^ Webb, Geoffrey (1989). "A Machine Learning Approach to Student Modelling". Proceedings of the Third Australian Joint Conference on Artificial Intelligence (AI 89): 195–205.
- ^ Webb, Geoffrey I. (2007). "Discovering Significant Patterns". Machine Learning. 68: 1–33. doi:10.1007/s10994-007-5006-x.
- ^ Gionis, Aristides; Mannila, Heikki; Mielikäinen, Taneli; Tsaparas, Panayiotis (2007). "Assessing data mining results via swap randomization". ACM Transactions on Knowledge Discovery from Data. 1 (3): 14–es. CiteSeerX 10.1.1.141.2607. doi:10.1145/1297332.1297338. S2CID 52305658.
- ^ Heaton, Jeff (2017-01-30). "Comparing Dataset Characteristics that Favor the Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms". arXiv:1701.09042 [cs.DB].
- ^ Zaki, Mohammed Javeed; Parthasarathy, Srinivasan; Ogihara, Mitsunori; Li, Wei (1997). "New Algorithms for Fast Discovery of Association Rules": 283–286. CiteSeerX 10.1.1.42.3283. hdl:1802/501.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Zaki, Mohammed J.; Parthasarathy, Srinivasan; Ogihara, Mitsunori; Li, Wei (1997). "Parallel Algorithms for Discovery of Association Rules". Data Mining and Knowledge Discovery. 1 (4): 343–373. doi:10.1023/A:1009773317876. S2CID 10038675.
- ^ Han (2000). "Mining frequent patterns without candidate generation". Proceedings of the 2000 ACM SIGMOD international conference on Management of data. Vol. SIGMOD '00. pp. 1–12. CiteSeerX 10.1.1.40.4436. doi:10.1145/342009.335372. ISBN 978-1581132175. S2CID 6059661.
- ^ Witten, Frank, Hall: Data mining practical machine learning tools and techniques, 3rd edition[page needed]
- ^ Hájek, Petr; Havránek, Tomáš (1978). Mechanizing Hypothesis Formation: Mathematical Foundations for a General Theory. Springer-Verlag. ISBN 978-3-540-08738-0.
- ^ a b Webb, Geoffrey I. (1995); OPUS: An Efficient Admissible Algorithm for Unordered Search, Journal of Artificial Intelligence Research 3, Menlo Park, CA: AAAI Press, pp. 431-465 online access
- ^ Bayardo, Roberto J. Jr.; Agrawal, Rakesh; Gunopulos, Dimitrios (2000). "Constraint-based rule mining in large, dense databases". Data Mining and Knowledge Discovery. 4 (2): 217–240. doi:10.1023/A:1009895914772. S2CID 5120441.
- ^ Webb, Geoffrey I. (2000). "Efficient search for association rules". Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '00. pp. 99–107. CiteSeerX 10.1.1.33.1309. doi:10.1145/347090.347112. ISBN 978-1581132335. S2CID 5444097.
- ^ a b "DSS News: Vol. 3, No. 23".
- ^ Ramezani, Reza, Mohamad Saraee, and Mohammad Ali Nematbakhsh; MRAR: Mining Multi-Relation Association Rules, Journal of Computing and Security, 1, no. 2 (2014)
- ^ GI Webb and S. Butler and D. Newlands (2003). On Detecting Differences Between Groups. KDD'03 Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
- ^ Menzies, T.; Ying Hu (2003). "Computing practices - Data mining for very busy people". Computer. 36 (11): 22–29. doi:10.1109/MC.2003.1244531.
- ^ Wong, A.K.C.; Yang Wang (1997). "High-order pattern discovery from discrete-valued data". IEEE Transactions on Knowledge and Data Engineering. 9 (6): 877–893. CiteSeerX 10.1.1.189.1704. doi:10.1109/69.649314.
- ^ Liu, Jinze; Paulsen, Susan; Sun, Xing; Wang, Wei; Nobel, Andrew; Prins, Jan (2006). "Mining Approximate Frequent Itemsets in the Presence of Noise: Algorithm and Analysis". Proceedings of the 2006 SIAM International Conference on Data Mining. pp. 407–418. CiteSeerX 10.1.1.215.3599. doi:10.1137/1.9781611972764.36. ISBN 978-0-89871-611-5.
- ^ Zaki, Mohammed J. (2001); SPADE: An Efficient Algorithm for Mining Frequent Sequences, Machine Learning Journal, 42, pp. 31–60
- ^ Zimek, Arthur; Assent, Ira; Vreeken, Jilles (2014). Frequent Pattern Mining. pp. 403–423. doi:10.1007/978-3-319-07821-2_16. ISBN 978-3-319-07820-5.
- ^ King, R. D.; Srinivasan, A.; Dehaspe, L. (Feb 2001). "Warmr: a data mining tool for chemical data". J Comput Aided Mol Des. 15 (2): 173–81. Bibcode:2001JCAMD..15..173K. doi:10.1023/A:1008171016861. PMID 11272703. S2CID 3055046.
Bibliographies
[edit]- Annotated Bibliography on Association Rules Archived 2017-02-19 at the Wayback Machine by M. Hahsler