Mean reciprocal rank: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 15:42, 12 April 2024

The mean reciprocal rank is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer: 1 for first place, 1⁄2 for second place, 1⁄3 for third place and so on. The mean reciprocal rank is the average of the reciprocal ranks of results for a sample of queries Q:^[1]^[2]

{\text{MRR}}={\frac {1}{|Q|}}\sum _{i=1}^{|Q|}{\frac {1}{{\text{rank}}_{i}}}.\!

where ${\text{rank}}_{i}$ refers to the rank position of the first relevant document for the i-th query.

The reciprocal value of the mean reciprocal rank corresponds to the harmonic mean of the ranks.

Example

Suppose we have the following three queries for a system that tries to translate English words to their plurals. In each case, the system makes three guesses, with the first one being the one it thinks is most likely correct:

Query	Proposed Results	Correct response	Rank	Reciprocal rank
cat	catten, cati, cats	cats	3	1/3
torus	torii, tori, toruses	tori	2	1/2
virus	viruses, virii, viri	viruses	1	1

Given those three samples, we could calculate the mean reciprocal rank as $(1/3+1/2+1)/3=11/18$ , or approximately 0.61.

If none of the proposed results are correct, the reciprocal rank is 0.^[1] Note that only the rank of the first relevant answer is considered, and possible further relevant answers are ignored. If users are also interested in further relevant items, mean average precision is a potential alternative metric.

References

^ ^a ^b E.M. Voorhees (1999). "Proceedings of the 8th Text Retrieval Conference" (PDF). TREC-8 Question Answering Track Report. pp. 77–82.
^ D. R. Radev; H. Qi; H. Wu; W. Fan (2002). "Evaluating web-based question answering systems" (PDF). Proceedings of LREC.

[TREC8-1] E.M. Voorhees (1999). "Proceedings of the 8th Text Retrieval Conference" (PDF). TREC-8 Question Answering Track Report. pp. 77–82.

[2] D. R. Radev; H. Qi; H. Wu; W. Fan (2002). "Evaluating web-based question answering systems" (PDF). Proceedings of LREC.

[1]

[2]

@@ Line 1: / Line 1: @@
+{{short description|Search quality measure in information retrieval}}
 {{Refimprove|date=June 2007}}
-The '''mean reciprocal rank''' is a [[statistic]] measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness. The reciprocal rank of a query response is the [[multiplicative inverse]] of the rank of the first correct answer: 1 for first place, {{frac|2}} for second place, {{frac|3}} for third place and so on. The mean reciprocal rank is the average of the reciprocal ranks of results for a sample of queries Q:<ref>{{cite conference | title=Proceedings of the 8th Text Retrieval Conference | booktitle=TREC-8 Question Answering Track Report | author=E.M. Voorhees |year=1999 | pages=77&ndash;82}}</ref><ref>{{cite conference | title=Evaluating web-based question answering systems | booktitle=Proceedings of LREC |author1=D. R. Radev |author2=H. Qi |author3=H. Wu |author4=W. Fan |year=2002 }}</ref>
+The '''mean reciprocal rank''' is a [[statistic]] measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness. The reciprocal rank of a query response is the [[multiplicative inverse]] of the rank of the first correct answer: 1 for first place, {{frac|2}} for second place, {{frac|3}} for third place and so on. The mean reciprocal rank is the average of the reciprocal ranks of results for a sample of queries Q:<ref name="TREC8">{{cite conference | title=Proceedings of the 8th Text Retrieval Conference | book-title=TREC-8 Question Answering Track Report | author=E.M. Voorhees |author-link= Ellen Voorhees |year=1999 | pages=77&ndash;82|url=https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication500-246.pdf}}</ref><ref>{{cite conference | title=Evaluating web-based question answering systems | book-title=Proceedings of LREC |author1=D. R. Radev |author2=H. Qi |author3=H. Wu |author4=W. Fan |year=2002 |url=http://www.lrec-conf.org/proceedings/lrec2002/pdf/301.pdf}}</ref>
 :<math> \text{MRR} = \frac{1}{|Q|} \sum_{i=1}^{|Q|} \frac{1}{\text{rank}_i}. \!</math>
@@ Line 9: / Line 10: @@
 == Example ==
-For example, suppose we have the following three sample queries for a system that tries to translate English words to their plurals.  In each case, the system makes three guesses, with the first one being the one it thinks is most likely correct:
+Suppose we have the following three queries for a system that tries to translate English words to their plurals.  In each case, the system makes three guesses, with the first one being the one it thinks is most likely correct:
 {| class="wikitable"
@@ Line 25: / Line 26: @@
 | 1/3
 |-
+|torus
-|tori
 | torii, '''tori''', toruses
 | tori
@@ Line 38: / Line 39: @@
 |}
-Given those three samples, we could calculate the mean reciprocal rank as (1/3&nbsp;+&nbsp;1/2&nbsp;+&nbsp;1)/3 = 11/18 or about 0.61.
+Given those three samples, we could calculate the mean reciprocal rank as <math> (1/3 + 1/2 + 1) / 3 = 11/18</math>, or approximately 0.61.
-If none of the proposed results are correct,  reciprocal rank is 0. Please note that only the rank of the first relevant answer is considered, possible further relevant answers are ignored. If users are interested also in further relevant items, [[Evaluation_measures_(information_retrieval)#Mean average precision|mean average precision]] is a potential alternative metric.
+If none of the proposed results are correct, the reciprocal rank is 0.<ref name="TREC8" /> Note that only the rank of the first relevant answer is considered, and possible further relevant answers are ignored. If users are also interested in further relevant items, [[Evaluation_measures_(information_retrieval)#Mean average precision|mean average precision]] is a potential alternative metric.
 ==See also==
@@ Line 48: / Line 49: @@
 ==References==
 {{Reflist}}
+{{Machine learning evaluation metrics}}

v t e Machine learning evaluation metrics
Regression	MSE MAE sMAPE MAPE MASE MSPE RMS RMSE/RMSD R² MDA MAD
Classification	F-score P4 Accuracy Precision Recall Kappa MCC AUC ROC Sensitivity and specificity Logarithmic Loss
Clustering	Silhouette Calinski-Harabasz index Davies-Bouldin Dunn index Hopkins statistic Jaccard index Rand index Similarity measure SMC SimHash
Ranking	MRR NDCG AP
Computer Vision	PSNR SSIM IoU
NLP	Perplexity BLEU
Deep Learning Related Metrics	Inception score FID
Recommender system	Coverage Intra-list Similarity
Similarity	Cosine similarity Euclidean distance Pearson correlation coefficient
Confusion matrix