Similarity learning: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 06:07, 21 December 2024

Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification.

Learning setup

There are four common setups for similarity and metric distance learning.

Regression similarity learning: In this setup, pairs of objects are given $(x_{i}^{1},x_{i}^{2})$ together with a measure of their similarity $y_{i}\in R$ . The goal is to learn a function that approximates $f(x_{i}^{1},x_{i}^{2})\sim y_{i}$ for every new labeled triplet example $(x_{i}^{1},x_{i}^{2},y_{i})$ . This is typically achieved by minimizing a regularized loss $\min _{W}\sum _{i}loss(w;x_{i}^{1},x_{i}^{2},y_{i})+reg(w)$ .
Classification similarity learning: Given are pairs of similar objects $(x_{i},x_{i}^{+})$ and non similar objects $(x_{i},x_{i}^{-})$ . An equivalent formulation is that every pair $(x_{i}^{1},x_{i}^{2})$ is given together with a binary label $y_{i}\in \{0,1\}$ that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not.
Ranking similarity learning: Given are triplets of objects $(x_{i},x_{i}^{+},x_{i}^{-})$ whose relative similarity obey a predefined order: $x_{i}$ is known to be more similar to $x_{i}^{+}$ than to $x_{i}^{-}$ . The goal is to learn a function $f$ such that for any new triplet of objects $(x,x^{+},x^{-})$ , it obeys $f(x,x^{+})>f(x,x^{-})$ (contrastive learning). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact measure of similarity, one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications.^[1]
Locality sensitive hashing (LSH)^[2]: Hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in nearest neighbor search on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases.^[3]

A common approach for learning similarity is to model the similarity function as a bilinear form. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function $f_{W}(x,z)=x^{T}Wz$ . When data is abundant, a common approach is to learn a siamese network – a deep network model with parameter sharing.

Metric learning

Similarity learning is closely related to distance metric learning. Metric learning is the task of learning a distance function over objects. A metric or distance function has to obey four axioms: non-negativity, identity of indiscernibles, symmetry and subadditivity (or the triangle inequality). In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric.

When the objects $x_{i}$ are vectors in $R^{d}$ , then any matrix $W$ in the symmetric positive semi-definite cone $S_{+}^{d}$ defines a distance pseudo-metric of the space of x through the form $D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }W(x_{1}-x_{2})$ . When $W$ is a symmetric positive definite matrix, $D_{W}$ is a metric. Moreover, as any symmetric positive semi-definite matrix $W\in S_{+}^{d}$ can be decomposed as $W=L^{\top }L$ where $L\in R^{e\times d}$ and $e\geq rank(W)$ , the distance function $D_{W}$ can be rewritten equivalently $D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }L^{\top }L(x_{1}-x_{2})=\|L(x_{1}-x_{2})\|_{2}^{2}$ . The distance $D_{W}(x_{1},x_{2})^{2}=\|x_{1}'-x_{2}'\|_{2}^{2}$ corresponds to the Euclidean distance between the transformed feature vectors $x_{1}'=Lx_{1}$ and $x_{2}'=Lx_{2}$ .

Many formulations for metric learning have been proposed.^[4]^[5] Some well-known approaches for metric learning include learning from relative comparisons,^[6] which is based on the triplet loss, large margin nearest neighbor,^[7] and information theoretic metric learning (ITML).^[8]

In statistics, the covariance matrix of the data is sometimes used to define a distance metric called Mahalanobis distance.

Applications

Similarity learning is used in information retrieval for learning to rank, in face verification or face identification,^[9]^[10] and in recommendation systems. Also, many machine learning approaches rely on some metric. This includes unsupervised learning such as clustering, which groups together close or similar objects. It also includes supervised approaches like K-nearest neighbor algorithm which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches.^[11]

Scalability

Metric and similarity learning naively scale quadratically with the dimension of the input space, as can easily see when the learned metric has a bilinear form $f_{W}(x,z)=x^{T}Wz$ . Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL,^[12] and with COMET.^[13]

Software

metric-learn^[14] is a free software Python library which offers efficient implementations of several supervised and weakly-supervised similarity and metric learning algorithms. The API of metric-learn is compatible with scikit-learn.^[15]

OpenMetricLearning^[16] is a Python framework to train and validate the models producing high-quality embeddings.

Further information

For further information on this topic, see the surveys on metric and similarity learning by Bellet et al.^[4] and Kulis.^[5]

References

^ Chechik, G.; Sharma, V.; Shalit, U.; Bengio, S. (2010). "Large Scale Online Learning of Image Similarity Through Ranking" (PDF). Journal of Machine Learning Research. 11: 1109–1135.
^ Gionis, Aristides, Piotr Indyk, and Rajeev Motwani. "Similarity search in high dimensions via hashing." VLDB. Vol. 99. No. 6. 1999.
^ Rajaraman, A.; Ullman, J. (2010). "Mining of Massive Datasets, Ch. 3".
^ ^a ^b Bellet, A.; Habrard, A.; Sebban, M. (2013). "A Survey on Metric Learning for Feature Vectors and Structured Data". arXiv:1306.6709 [cs.LG].
^ ^a ^b Kulis, B. (2012). "Metric Learning: A Survey". Foundations and Trends in Machine Learning. 5 (4): 287–364. doi:10.1561/2200000019.
^ Schultz, M.; Joachims, T. (2004). "Learning a distance metric from relative comparisons" (PDF). Advances in Neural Information Processing Systems. 16: 41–48.
^ Weinberger, K. Q.; Blitzer, J. C.; Saul, L. K. (2006). "Distance Metric Learning for Large Margin Nearest Neighbor Classification" (PDF). Advances in Neural Information Processing Systems. 18: 1473–1480.
^ Davis, J. V.; Kulis, B.; Jain, P.; Sra, S.; Dhillon, I. S. (2007). "Information-theoretic metric learning". International Conference in Machine Learning (ICML): 209–216.
^ Guillaumin, M.; Verbeek, J.; Schmid, C. (2009). "Is that you? Metric learning approaches for face identification" (PDF). IEEE International Conference on Computer Vision (ICCV).
^ Mignon, A.; Jurie, F. (2012). "PCCA: A new approach for distance learning from sparse pairwise constraints" (PDF). IEEE Conference on Computer Vision and Pattern Recognition.
^ Xing, E. P.; Ng, A. Y.; Jordan, M. I.; Russell, S. (2002). "Distance Metric Learning, with Application to Clustering with Side-information" (PDF). Advances in Neural Information Processing Systems. 15: 505–512.
^ Liu; Bellet; Sha (2015). "Similarity Learning for High-Dimensional Sparse Data" (PDF). International Conference on Artificial Intelligence and Statistics (AISTATS). arXiv:1411.2374. Bibcode:2014arXiv1411.2374L.
^ Atzmon; Shalit; Chechik (2015). "Learning Sparse Metrics, One Feature at a Time" (PDF). J. Mach. Learn. Research (JMLR).
^ "Scikit-learn-contrib/Metric-learn". GitHub.
^ Vazelhes; Carey; Tang; Vauquier; Bellet (2020). "metric-learn: Metric Learning Algorithms in Python" (PDF). J. Mach. Learn. Research (JMLR). arXiv:1908.04710.
^ "OML-Team/Open-metric-learning". GitHub.

[1] Chechik, G.; Sharma, V.; Shalit, U.; Bengio, S. (2010). "Large Scale Online Learning of Image Similarity Through Ranking" (PDF). Journal of Machine Learning Research. 11: 1109–1135.

[2] Gionis, Aristides, Piotr Indyk, and Rajeev Motwani. "Similarity search in high dimensions via hashing." VLDB. Vol. 99. No. 6. 1999.

[3] Rajaraman, A.; Ullman, J. (2010). "Mining of Massive Datasets, Ch. 3".

[survey-4] Bellet, A.; Habrard, A.; Sebban, M. (2013). "A Survey on Metric Learning for Feature Vectors and Structured Data". arXiv:1306.6709 [cs.LG].

[survey2-5] Kulis, B. (2012). "Metric Learning: A Survey". Foundations and Trends in Machine Learning. 5 (4): 287–364. doi:10.1561/2200000019.

[SchultzJoachims-6] Schultz, M.; Joachims, T. (2004). "Learning a distance metric from relative comparisons" (PDF). Advances in Neural Information Processing Systems. 16: 41–48.

[LMNN-7] Weinberger, K. Q.; Blitzer, J. C.; Saul, L. K. (2006). "Distance Metric Learning for Large Margin Nearest Neighbor Classification" (PDF). Advances in Neural Information Processing Systems. 18: 1473–1480.

[ITML-8] Davis, J. V.; Kulis, B.; Jain, P.; Sra, S.; Dhillon, I. S. (2007). "Information-theoretic metric learning". International Conference in Machine Learning (ICML): 209–216.

[GUILLAUMIN-9] Guillaumin, M.; Verbeek, J.; Schmid, C. (2009). "Is that you? Metric learning approaches for face identification" (PDF). IEEE International Conference on Computer Vision (ICCV).

[MIGNON-10] Mignon, A.; Jurie, F. (2012). "PCCA: A new approach for distance learning from sparse pairwise constraints" (PDF). IEEE Conference on Computer Vision and Pattern Recognition.

[XING-11] Xing, E. P.; Ng, A. Y.; Jordan, M. I.; Russell, S. (2002). "Distance Metric Learning, with Application to Clustering with Side-information" (PDF). Advances in Neural Information Processing Systems. 15: 505–512.

[Liu-12] Liu; Bellet; Sha (2015). "Similarity Learning for High-Dimensional Sparse Data" (PDF). International Conference on Artificial Intelligence and Statistics (AISTATS). arXiv:1411.2374. Bibcode:2014arXiv1411.2374L.

[13] Atzmon; Shalit; Chechik (2015). "Learning Sparse Metrics, One Feature at a Time" (PDF). J. Mach. Learn. Research (JMLR).

[14] "Scikit-learn-contrib/Metric-learn". GitHub.

[15] Vazelhes; Carey; Tang; Vauquier; Bellet (2020). "metric-learn: Metric Learning Algorithms in Python" (PDF). J. Mach. Learn. Research (JMLR). arXiv:1908.04710.

[16] "OML-Team/Open-metric-learning". GitHub.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

@@ Line 1: / Line 1: @@
+{{Short description|Supervised learning of a similarity function}}
-'''Similarity learning''' is an area of supervised [[machine learning]] in [[artificial intelligence]]. It is closely related to [[regression (machine learning)|regression]] and [[classification in machine learning|classification]], but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in [[ranking]], in [[recommendation systems]],
+'''Similarity learning''' is an area of [[Supervised learning|supervised machine learning]] in [[artificial intelligence]]. It is closely related to [[regression (machine learning)|regression]] and [[classification in machine learning|classification]], but the goal is to learn a [[similarity function]] that measures how [[Similarity (philosophy)|similar]] or related two objects are. It has applications in [[ranking]], in [[recommendation systems]], visual identity tracking, face verification, and speaker verification.
-visual identity tracking, face verification, and speaker verification.
 == Learning setup ==
@@ Line 11: / Line 11: @@
 : Given are pairs of similar objects <math>(x_i, x_i^+) </math> and non similar objects <math>(x_i, x_i^-)</math>. An equivalent formulation is that every pair <math>(x_i^1, x_i^2)</math> is given together with a binary label <math>y_i \in \{0,1\}</math> that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not.
 ; ''Ranking similarity learning''
-: Given are triplets of objects <math>(x_i, x_i^+, x_i^-)</math> whose relative similarity obey a predefined order: <math>x_i</math> is known to be more similar to <math>x_i^+</math> than to <math>x_i^-</math>. The goal is to learn a function <math>f</math> such that for any new triplet of objects <math>(x, x^+, x^-)</math>, it obeys <math>f(x, x^+) > f(x, x^-)</math> ([[constrastive learning]]). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact [[Similarity measure|measure of similarity]], one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications. <ref>{{cite journal| last1 = Chechik | first1 = G. | last2 = Sharma | first2 = V. | last3 = Shalit | first3 = U. | last4 = Bengio | first4 = S. | title=Large Scale Online Learning of Image Similarity Through Ranking|journal=Journal of Machine Learning Research|year=2010|volume=11|pages=1109–1135|url=http://www.jmlr.org/papers/volume11/chechik10a/chechik10a.pdf}}</ref>
+: Given are triplets of objects <math>(x_i, x_i^+, x_i^-)</math> whose relative similarity obey a predefined order: <math>x_i</math> is known to be more similar to <math>x_i^+</math> than to <math>x_i^-</math>. The goal is to learn a function <math>f</math> such that for any new triplet of objects <math>(x, x^+, x^-)</math>, it obeys <math>f(x, x^+) > f(x, x^-)</math> ([[contrastive learning]]). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact [[Similarity measure|measure of similarity]], one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications.<ref>{{cite journal| last1 = Chechik | first1 = G. | last2 = Sharma | first2 = V. | last3 = Shalit | first3 = U. | last4 = Bengio | first4 = S. | title=Large Scale Online Learning of Image Similarity Through Ranking|journal=Journal of Machine Learning Research|year=2010|volume=11|pages=1109–1135|url=http://www.jmlr.org/papers/volume11/chechik10a/chechik10a.pdf}}</ref>
 ; [[Locality sensitive hashing]] (LSH)<ref>Gionis, Aristides, Piotr Indyk, and Rajeev Motwani. "Similarity search in high dimensions via hashing." VLDB. Vol. 99. No. 6. 1999.</ref>
-: [[Hash Function|hashes]] input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in nearest neighbor search on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases.<ref>{{cite web
+: [[Hash function|Hashes]] input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in [[nearest neighbor search]] on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases.<ref>{{cite web
 | first1 = A.|last1=Rajaraman |first2= J.|last2=Ullman|author2-link=Jeffrey Ullman
 | url = http://infolab.stanford.edu/~ullman/mmds.html
@@ Line 20: / Line 20: @@
 }}</ref>
-A common approach for learning similarity, is to model the similarity function as a [[bilinear form]]. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function <math> f_W(x, z)  = x^T W z </math>. When data is abundant, a common approach is to learn a [[siamese network]] - A deep network model with parameter sharing.
+A common approach for learning similarity is to model the similarity function as a [[bilinear form]]. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function <math> f_W(x, z)  = x^T W z </math>. When data is abundant, a common approach is to learn a [[siamese network]] – a deep network model with parameter sharing.
 == Metric learning ==
@@ Line 26: / Line 26: @@
 Similarity learning is closely related to ''distance metric learning''. Metric learning is the task of learning a distance function over objects. A [[Metric (mathematics)|metric]] or [[distance function]] has to obey four axioms: [[non-negative|non-negativity]], [[identity of indiscernibles]], [[symmetry]] and [[subadditivity]] (or the triangle inequality). In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric.
-When the objects <math>x_i</math> are vectors in <math>R^d</math>, then any matrix <math>W</math> in the symmetric positive semi-definite cone <math>S_+^d</math> defines a distance pseudo-metric of the space of x through the form <math>D_W(x_1, x_2)^2 = (x_1-x_2)^{\top} W (x_1-x_2)</math>. When <math>W</math> is a symmetric positive definite matrix, <math>D_W</math> is a metric. Moreover, as any symmetric positive semi-definite matrix <math>W \in S_+^d</math> can be decomposed as <math>W = L^{\top}L</math> where <math>L \in R^{e \times d}</math> and <math>e \geq rank(W)</math>, the distance function <math>D_W</math> can be rewritten equivalently <math>D_W(x_1, x_2)^2 = (x_1-x_2)^{\top} L^{\top}L (x_1-x_2) = \| L (x_1-x_2) \|_2^2</math>. The distance <math>D_W(x_1, x_2)^2=\| x_1' - x_2' \|_2^2</math> corresponds to the Euclidean distance between the projected [[feature vector]]s <math>x_1'= Lx_1</math> and <math>x_2'= Lx_2</math>.
+When the objects <math>x_i</math> are vectors in <math>R^d</math>, then any matrix <math>W</math> in the symmetric positive semi-definite cone <math>S_+^d</math> defines a distance pseudo-metric of the space of x through the form <math>D_W(x_1, x_2)^2 = (x_1-x_2)^{\top} W (x_1-x_2)</math>. When <math>W</math> is a symmetric positive definite matrix, <math>D_W</math> is a metric. Moreover, as any symmetric positive semi-definite matrix <math>W \in S_+^d</math> can be decomposed as <math>W = L^{\top}L</math> where <math>L \in R^{e \times d}</math> and <math>e \geq rank(W)</math>, the distance function <math>D_W</math> can be rewritten equivalently <math>D_W(x_1, x_2)^2 = (x_1-x_2)^{\top} L^{\top}L (x_1-x_2) = \| L (x_1-x_2) \|_2^2</math>. The distance <math>D_W(x_1, x_2)^2=\| x_1' - x_2' \|_2^2</math> corresponds to the Euclidean distance between the transformed [[feature vector]]s <math>x_1'= Lx_1</math> and <math>x_2'= Lx_2</math>.
-Some well-known approaches for metric learning include [[Large margin nearest neighbor]],<ref name=LMNN>{{cite journal| last1 = Weinberger | first1 = K. Q. | last2 = Blitzer | first2 = J. C. | last3 = Saul | first3 = L. K. | title=Distance Metric Learning for Large Margin Nearest Neighbor Classification|journal=Advances in Neural Information Processing Systems |volume=18|year=2006|pages=1473–1480|url=http://books.nips.cc/papers/files/nips18/NIPS2005_0265.pdf}}</ref> Information theoretic metric learning (ITML).<ref name=ITML>{{cite journal | last1 = Davis | first1 = J. V. | last2 = Kulis | first2 = B. | last3 = Jain | first3 = P. | last4 = Sra | first4 = S. | last5 = Dhillon | first5 = I. S. | title=Information-theoretic metric learning | journal=International Conference in Machine Learning (ICML) | year=2007 | pages=209–216 | url=http://www.cs.utexas.edu/users/pjain/itml/}}</ref>
+Many formulations for metric learning have been proposed.<ref name="survey">{{cite arXiv | last1 = Bellet | first1 = A. | last2 = Habrard | first2 = A. | last3 = Sebban | first3 = M. |eprint=1306.6709 |class=cs.LG |title=A Survey on Metric Learning for Feature Vectors and Structured Data |year=2013}}</ref><ref name="survey2">{{cite journal| last = Kulis | first = B.| title=Metric Learning: A Survey | journal=Foundations and Trends in Machine Learning | volume = 5| issue = 4| pages = 287–364| year=2012 | url=https://www.nowpublishers.com/article/Details/MAL-019| doi = 10.1561/2200000019}}</ref> Some well-known approaches for metric learning include learning from relative comparisons,<ref name="SchultzJoachims">{{cite journal| last1 = Schultz | first1 = M. | last2 = Joachims | first2 = T. | title=Learning a distance metric from relative comparisons|journal=Advances in Neural Information Processing Systems |volume=16|year=2004|pages=41–48|url=https://papers.nips.cc/paper/2366-learning-a-distance-metric-from-relative-comparisons.pdf}}</ref> which is based on the [[triplet loss]], [[large margin nearest neighbor]],<ref name="LMNN">{{cite journal| last1 = Weinberger | first1 = K. Q. | last2 = Blitzer | first2 = J. C. | last3 = Saul | first3 = L. K. | title=Distance Metric Learning for Large Margin Nearest Neighbor Classification|journal=Advances in Neural Information Processing Systems |volume=18|year=2006|pages=1473–1480|url=http://books.nips.cc/papers/files/nips18/NIPS2005_0265.pdf}}</ref> and information theoretic metric learning (ITML).<ref name="ITML">{{cite journal | last1 = Davis | first1 = J. V. | last2 = Kulis | first2 = B. | last3 = Jain | first3 = P. | last4 = Sra | first4 = S. | last5 = Dhillon | first5 = I. S. | title=Information-theoretic metric learning | journal=International Conference in Machine Learning (ICML) | year=2007 | pages=209–216 | url=http://www.cs.utexas.edu/users/pjain/itml/}}</ref>
 In [[statistics]], the [[covariance]] matrix of the data is sometimes used to define a distance metric called [[Mahalanobis distance]].
 == Applications ==
-Similarity learning is used in information retrieval for [[learning to rank]], in face verification or face identification,<ref name=GUILLAUMIN>{{cite journal| last1 = Guillaumin | first1 = M.  | last2 = Verbeek | first2 = J. | last3 = Schmid | first3 = C. | title=Is that you? Metric learning approaches for face identification|url=http://hal.inria.fr/docs/00/58/50/36/PDF/verbeek09iccv2.pdf|journal=IEEE International Conference on Computer Vision (ICCV)|year=2009}}</ref><ref name=MIGNON>{{cite journal| last1 = Mignon | first1 = A. | last2 = Jurie | first2 = F. | title=PCCA: A new approach for distance learning from sparse pairwise constraints|journal=IEEE Conference on Computer Vision and Pattern Recognition|year=2012|url=http://hal.archives-ouvertes.fr/docs/00/80/60/07/PDF/12_cvpr_ldca.pdf}}</ref> and in [[recommendation systems]]. Also, many machine learning approaches rely on some metric. This includes unsupervised learning such as [[cluster analysis|clustering]], which groups together close or similar objects. It also includes supervised approaches like [[K-nearest neighbor algorithm]] which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches.<ref name=XING>{{cite journal| last1 = Xing | first1 = E. P. | last2 = Ng | first2 = A. Y. | last3 = Jordan | first3 = M. I. | last4 = Russell | first4 = S. | title=Distance Metric Learning, with Application to Clustering with Side-information | journal=Advances in Neural Information Processing Systems |volume=15 | year=2002| pages = 505–512 | url=https://ai.stanford.edu/~ang/papers/nips02-metric.pdf}}</ref>
+Similarity learning is used in information retrieval for [[learning to rank]], in face verification or face identification,<ref name=GUILLAUMIN>{{cite journal| last1 = Guillaumin | first1 = M.  | last2 = Verbeek | first2 = J. | last3 = Schmid | first3 = C. | title=Is that you? Metric learning approaches for face identification|url=http://hal.inria.fr/docs/00/58/50/36/PDF/verbeek09iccv2.pdf|journal=IEEE International Conference on Computer Vision (ICCV)|year=2009}}</ref><ref name=MIGNON>{{cite journal| last1 = Mignon | first1 = A. | last2 = Jurie | first2 = F. | title=PCCA: A new approach for distance learning from sparse pairwise constraints|journal=IEEE Conference on Computer Vision and Pattern Recognition|year=2012|url=http://hal.archives-ouvertes.fr/docs/00/80/60/07/PDF/12_cvpr_ldca.pdf}}</ref> and in [[recommendation systems]]. Also, many machine learning approaches rely on some metric. This includes [[unsupervised learning]] such as [[cluster analysis|clustering]], which groups together close or similar objects. It also includes supervised approaches like [[K-nearest neighbor algorithm]] which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches.<ref name=XING>{{cite journal| last1 = Xing | first1 = E. P. | last2 = Ng | first2 = A. Y. | last3 = Jordan | first3 = M. I. | last4 = Russell | first4 = S. | title=Distance Metric Learning, with Application to Clustering with Side-information | journal=Advances in Neural Information Processing Systems |volume=15 | year=2002| pages = 505–512 | url=https://ai.stanford.edu/~ang/papers/nips02-metric.pdf}}</ref>
 == Scalability ==
 Metric and similarity learning naively scale quadratically with the dimension of the input space, as can easily see when the learned metric has a bilinear form <math> f_W(x, z)  = x^T W z </math>. Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL,<ref name=Liu>{{Cite journal| last1=Liu | last2=Bellet | last3=Sha| title=Similarity Learning for High-Dimensional Sparse Data|year=2015|journal=International Conference on Artificial Intelligence and Statistics (AISTATS)| arxiv=1411.2374 | bibcode=2014arXiv1411.2374L |url=http://jmlr.org/proceedings/papers/v38/liu15.pdf}}</ref> and with COMET.<ref>{{Cite journal | last1=Atzmon | last2=Shalit | last3=Chechik | title=Learning Sparse Metrics, One Feature at a Time | journal=J. Mach. Learn. Research (JMLR)|year=2015|url=http://jmlr.org/proceedings/papers/v44/atzmon2015.pdf}}</ref>
+== Software ==
+*metric-learn<ref>{{cite web | url=https://github.com/scikit-learn-contrib/metric-learn | title=Scikit-learn-contrib/Metric-learn | website=[[GitHub]] }}</ref> is a [[free software]] Python library which offers efficient implementations of several supervised and weakly-supervised similarity and metric learning algorithms. The API of metric-learn is compatible with [[scikit-learn]].<ref>{{Cite journal | last1=Vazelhes | last2=Carey | last3=Tang | last4=Vauquier | last5=Bellet | title=metric-learn: Metric Learning Algorithms in Python | journal=J. Mach. Learn. Research (JMLR)|year=2020| arxiv=1908.04710 |url=https://www.jmlr.org/papers/volume21/19-678/19-678.pdf}}</ref>
+* OpenMetricLearning<ref>{{cite web | url=https://github.com/OML-Team/open-metric-learning | title=OML-Team/Open-metric-learning | website=[[GitHub]] }}</ref> is a [[Python (programming language)|Python]] framework to train and validate the models producing high-quality embeddings.
+== Further information ==
+For further information on this topic, see the surveys on metric and similarity learning by Bellet et al.<ref name=survey/> and Kulis.<ref name=survey2/>
 ==See also==
-*[[Learning to rank]]
+*[[Kernel method]]
 *[[Latent semantic analysis]]
+*[[Learning to rank]]
-== Further reading ==
-For further information on this topic, see the surveys on metric and similarity learning by Bellet et al.<ref name=survey>{{cite arXiv | last1 = Bellet | first1 = A. | last2 = Habrard | first2 = A. | last3 = Sebban | first3 = M. |eprint=1306.6709 |class=cs.LG |title=A Survey on Metric Learning for Feature Vectors and Structured Data |year=2013}}</ref> and Kulis.<ref name=survey2>{{cite journal| last = Kulis | first = B.| title=Metric Learning: A Survey | journal=Foundations and Trends in Machine Learning | volume = 5| issue = 4| pages = 287–364| year=2012 | url=https://www.nowpublishers.com/article/Details/MAL-019| doi = 10.1561/2200000019}}</ref>
 == References ==
@@ Line 49: / Line 57: @@
 [[Category:Machine learning]]
+[[Category:Semantic relations]]