Jump to content

Minimum redundancy feature selection: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
different measures of relevance and redundancy, comparison article
pointing out a bad reference
Line 1: Line 1:
'''Minimum redundancy feature selection''' is an algorithm frequently used in a method to accurately identify characteristics of [[gene]]s and [[phenotype]]s and narrow down their relevance and is usually described in its pairing with relevant feature selection as ''Minimum Redundancy Maximum Relevance'' (mRMR).<ref name="DINPENDREX">{{cite web|last=Ding|first=Chris|title=Minimum Redundancy Feature Selection and Extraction|url=http://www.ischool.drexel.edu/ieeebibm/bibm07/ChrisDing-bibm07tutor.pdf|publisher=Ischool at Drexel|accessdate=8 August 2010|author=Chris Ding|coauthors=Hanchuan Peng|format=PDF Lecture}}</ref>
'''Minimum redundancy feature selection''' is an algorithm frequently used in a method to accurately identify characteristics of [[gene]]s and [[phenotype]]s and narrow down their relevance and is usually described in its pairing with relevant feature selection as ''Minimum Redundancy Maximum Relevance'' (mRMR).<ref name="DINPENDREX">{{cite web|last=Ding|first=Chris|title=Minimum Redundancy Feature Selection and Extraction|url=http://www.ischool.drexel.edu/ieeebibm/bibm07/ChrisDing-bibm07tutor.pdf|publisher=Ischool at Drexel|accessdate=8 August 2010|author=Chris Ding|coauthors=Hanchuan Peng|format=PDF Lecture}}</ref> (note: PDF reference is only an ad posting for a related lecture)


''[[Feature selection]]'', one of the basic problems in pattern recognition and machine learning, identifies subsets of data that are relevant to the parameters used and is normally called ''Maximum Relevance''. These subsets often contain material which is relevant but redundant and mRMR attempts to address this problem by removing those redundant subsets. mRMR has a variety of applications in many areas such as cancer diagnosis and speech recognition.
''[[Feature selection]]'', one of the basic problems in pattern recognition and machine learning, identifies subsets of data that are relevant to the parameters used and is normally called ''Maximum Relevance''. These subsets often contain material which is relevant but redundant and mRMR attempts to address this problem by removing those redundant subsets. mRMR has a variety of applications in many areas such as cancer diagnosis and speech recognition.

Revision as of 20:09, 19 May 2011

Minimum redundancy feature selection is an algorithm frequently used in a method to accurately identify characteristics of genes and phenotypes and narrow down their relevance and is usually described in its pairing with relevant feature selection as Minimum Redundancy Maximum Relevance (mRMR).[1] (note: PDF reference is only an ad posting for a related lecture)

Feature selection, one of the basic problems in pattern recognition and machine learning, identifies subsets of data that are relevant to the parameters used and is normally called Maximum Relevance. These subsets often contain material which is relevant but redundant and mRMR attempts to address this problem by removing those redundant subsets. mRMR has a variety of applications in many areas such as cancer diagnosis and speech recognition.

Features can be selected in many different ways. One scheme is to select features that correlate strongest to the classification variable. This has been called maximum-relevance selection. Many heuristic algorithms can be used, such as the sequential forward, backward, or floating selections.

On the other hand features can be selected to be mutually far away from each other while still having "high" correlation to the classification variable. This scheme, termed as Minimum Redundancy Maximum Relevance (mRMR) selection has been found to be more powerful than the maximum relevance selection.

As a special case, the "correlation" can be replaced by the statistical dependency between variables. Mutual information can be used to quantify the dependency. In this case, it is shown that mRMR is an approximation to maximizing the dependency between the joint distribution of the selected features and the classification variable.

Studies have tried different measures for redundancy and relevance measures. A recent study compared several measures within the context of biomedical images.[2]

References

  1. ^ Ding, Chris. "Minimum Redundancy Feature Selection and Extraction" (PDF Lecture). Ischool at Drexel. Retrieved 8 August 2010. {{cite web}}: More than one of |author= and |last= specified (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
  2. ^ Auffarth, B., Lopez, M., Cerquides, J. (2010). Comparison of redundancy and relevance measures for feature selection in tissue classification of CT images. Advances in Data Mining. Applications and Theoretical Aspects. p. 248--262. Springer. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.170.1528
  • Peng, H.C., Long, F., and Ding, C., "Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp. 1226–1238, 2005. Program
  • Chris Ding and Hanchuan Peng, "Minimum Redundancy Feature Selection from Microarray Gene Expression Data". 2nd IEEE Computer Society Bioinformatics Conference (CSB 2003), 11–14 August 2003, Stanford, CA, USA. Pages 523-529.
  • mRMR Janelia