Consensus clustering: Difference between revisions
m Quick-adding category "Data mining" (using HotCat) |
|||
Line 20: | Line 20: | ||
{{catneeded}} |
{{catneeded}} |
||
{{stub}} |
{{stub}} |
||
[[Category:Data mining]] |
Revision as of 23:01, 20 February 2009
You must add a |reason=
parameter to this Cleanup template – replace it with {{Cleanup|reason=<Fill reason here>}}
, or remove the Cleanup template.
Clustering is the assignment of objects into groups (called clusters) so that objects from the same cluster are more similar to each other than objects from different clusters. Often similarity is assessed according to a distance measure. Clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.
Consensus clustering has emerged as an important elaboration of the classical clustering problem. Consensus clustering, also called aggregation of clustering (or partitions), refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Consensus clustering is thus the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. When cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete.
Why Consensus Clustering?
- There are potential shortcomings for each of the known clustering techniques.
- Interpretation of results are difficult in a few cases.
- When there is no knowledge about the number of clusters, it becomes difficult.
- They are extremely sensitive to the initial settings.
- Some algorithms can never undo what was done previously.
- Iterative descent clustering methods, such as the SOM and K-Means clustering circumvent some of the shortcomings of Hierarchical clustering by providing for univocally defined clusters and cluster boundaries. However, they lack the intuitive and visual appeal of Hierarchical clustering, and the number of clusters must be chosen a priori.
- An extremely important issue in cluster analysis is the validation of the clustering results, that is, how to gain confidence about the significance of the clusters provided by the clustering technique, (cluster numbers and cluster assignments). Lacking an external objective criterion (the equivalent of a known class label in supervised learning) this validation becomes somewhat elusive.
References
Further reading
- Andrey Goder and Vladimir Filkov. "Consensus Clustering Algorithms: Comparison and Refinement" (PDF). 2008 Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments (ALENEX) — San Francisco, January 19, 2008. Society for Industrial and Applied Mathematics.
{{cite conference}}
: Unknown parameter|booktitle=
ignored (|book-title=
suggested) (help) - Tao Li and Chris Ding. "Weighted Consensus Clustering" (PDF). Proceedings of the 2008 SIAM International Conference on Data Mining — Atlanta, April 24–26, 2008. Society for Industrial and Applied Mathematics.
{{cite conference}}
: Unknown parameter|booktitle=
ignored (|book-title=
suggested) (help)
This article has not been added to any content categories. Please help out by adding categories to it so that it can be listed with similar articles, in addition to a stub category. |