Adversarial information retrieval: Difference between revisions
m →Topics |
mNo edit summary |
||
Line 1: | Line 1: | ||
'''Adversarial information retrieval (adversarial IR)''' is a topic in [[information retrieval]] that addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation. |
'''Adversarial information retrieval (adversarial IR)''' is a topic in [[information retrieval]] that addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation. |
||
On the Web, the predominant form of such manipulation is [[spamdexing|search engine spamming]] (also known as spamdexing), including techniques that are employed to disrupt the activity of [[web search engines]], usually for financial gain. Examples of spamdexing are [[Google bomb|link-bombing]], [[comment spam|comment]] or [[referrer spam]], [[spam blog]]s (splogs), malicious tagging, reverse engineering of ranking algorithms, advertisement blocking, and web content filtering <ref>B. Davison, M. Najork, and T. Converse (2006), [http://www.acm.org/sigs/sigir/forum/2006D/2006d_sigirforum_davison.pdf SIGIR Worksheet Report: Adversarial Information Retrieval on the Web (AIRWeb 2006)]</ref>. |
On the Web, the predominant form of such manipulation is [[spamdexing|search engine spamming]] (also known as spamdexing), including techniques that are employed to disrupt the activity of [[web search engines]], usually for financial gain. Examples of spamdexing are [[Google bomb|link-bombing]], [[comment spam|comment]] or [[referrer spam]], [[spam blog]]s (splogs), malicious tagging, reverse engineering of ranking algorithms, [[advertisement blocking]], and web content filtering <ref>B. Davison, M. Najork, and T. Converse (2006), [http://www.acm.org/sigs/sigir/forum/2006D/2006d_sigirforum_davison.pdf SIGIR Worksheet Report: Adversarial Information Retrieval on the Web (AIRWeb 2006)]</ref>. |
||
The name stems from the fact that there are two sides with opposing goals. For instance, the relationship between the owner of a Web site trying to rank high on a search engine and the search engine administrator is an adversarial relationship in a [[zero-sum]] [[game theory|game]]. Every undeserved gain in ranking by the web site is a loss of precision for the search engine. |
The name stems from the fact that there are two sides with opposing goals. For instance, the relationship between the owner of a Web site trying to rank high on a search engine and the search engine administrator is an adversarial relationship in a [[zero-sum]] [[game theory|game]]. Every undeserved gain in ranking by the web site is a loss of precision for the search engine. |
Revision as of 06:03, 19 February 2008
Adversarial information retrieval (adversarial IR) is a topic in information retrieval that addresses tasks such as gathering, indexing, filtering, retrieving and ranking information from collections wherein a subset has been manipulated maliciously. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.
On the Web, the predominant form of such manipulation is search engine spamming (also known as spamdexing), including techniques that are employed to disrupt the activity of web search engines, usually for financial gain. Examples of spamdexing are link-bombing, comment or referrer spam, spam blogs (splogs), malicious tagging, reverse engineering of ranking algorithms, advertisement blocking, and web content filtering [1].
The name stems from the fact that there are two sides with opposing goals. For instance, the relationship between the owner of a Web site trying to rank high on a search engine and the search engine administrator is an adversarial relationship in a zero-sum game. Every undeserved gain in ranking by the web site is a loss of precision for the search engine.
Topics
Topics related to Web spam (spamdexing):
- Link spam
- Keyword spamming
- Cloaking
- Malicious tagging
- Spam related to blogs, including comment spam, splogs, and ping spam
Other topics:
- Click fraud detection
- Reverse engineering of a search engine's ranking algorithm
- Web content filtering
- Advertisement blocking
- Stealth crawling
History
The term "adversarial information retrieval" was first coined in 2000 by Andrei Broder (then Chief Scientist at Alta Vista) during the Web plenary session at the TREC-9 conference[2].
References
- ^ B. Davison, M. Najork, and T. Converse (2006), SIGIR Worksheet Report: Adversarial Information Retrieval on the Web (AIRWeb 2006)
- ^ D. Hawking and N. Craswell (2004), Very Large Scale Retrieval and Web Search (Preprint version)
See also
External links
- AIRWeb: series of workshops on Adversarial Information Retrieval on the Web
- Web Spam Challenge: competition for researchers on Web Spam Detection
- Web Spam Datasets: datasets for research on Web Spam Detection