Adversarial information retrieval: Difference between revisions

Content deleted Content added

Inline

Revision as of 21:05, 30 November 2011

Adversarial information retrieval (adversarial IR) is a topic in information retrieval related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.

On the Web, the predominant form of such manipulation is search engine spamming (also known as spamdexing), including techniques that are employed to disrupt the activity of web search engines, usually for financial gain. Examples of spamdexing are link-bombing, comment or referrer spam, spam blogs (splogs), malicious tagging. Reverse engineering of ranking algorithms, advertisement blocking, and web content filtering may also be considered forms of adversarial data manipulation.^[1]

Activities intended to poison the supply of useful data make search engines less useful for users. If search engines are more exclusionary they risk becoming more like directories and less dynamic.

Topics

Topics related to Web spam (spamdexing):

Link spam
Keyword spamming
Cloaking
Malicious tagging
Spam related to blogs, including comment spam, splogs, and ping spam

History

The term "adversarial information retrieval" was first coined in 2000 by Andrei Broder (then Chief Scientist at Alta Vista) during the Web plenary session at the TREC-9 conference^[2].

References

^ B. Davison, M. Najork, and T. Converse (2006), SIGIR Worksheet Report: Adversarial Information Retrieval on the Web (AIRWeb 2006)
^ D. Hawking and N. Craswell (2004), Very Large Scale Retrieval and Web Search (Preprint version)

External links

AIRWeb: series of workshops on Adversarial Information Retrieval on the Web
Web Spam Challenge: competition for researchers on Web Spam Detection
Web Spam Datasets: datasets for research on Web Spam Detection

[1] B. Davison, M. Najork, and T. Converse (2006), SIGIR Worksheet Report: Adversarial Information Retrieval on the Web (AIRWeb 2006)

[2] D. Hawking and N. Craswell (2004), Very Large Scale Retrieval and Web Search (Preprint version)

[1]

[2]

@@ Line 1: / Line 1: @@
 '''Adversarial information retrieval''' ('''adversarial IR''') is a topic in [[information retrieval]] related to strategies for working with a data source where some portion of it has been manipulated maliciously.  Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.
-On the Web, the predominant form of such manipulation is [[spamdexing|search engine spamming]] (also known as spamdexing), including techniques that are employed to disrupt the activity of [[web search engines]], usually for financial gain. Examples of spamdexing are [[Google bomb|link-bombing]], [[comment spam|comment]] or [[referrer spam]], [[spam blog]]s (splogs), malicious tagging.  Reverse engineering of [[ranking function|ranking algorithms]], [[Ad filtering|advertisement blocking]], and [[web content filtering]] may also be considered forms of adversarial data manipulation. <ref>B. Davison, M. Najork, and T. Converse (2006), [http://www.acm.org/sigs/sigir/forum/2006D/2006d_sigirforum_davison.pdf SIGIR Worksheet Report: Adversarial Information Retrieval on the Web (AIRWeb 2006)]</ref>.
+On the Web, the predominant form of such manipulation is [[spamdexing|search engine spamming]] (also known as spamdexing), including techniques that are employed to disrupt the activity of [[web search engines]], usually for financial gain. Examples of spamdexing are [[Google bomb|link-bombing]], [[comment spam|comment]] or [[referrer spam]], [[spam blog]]s (splogs), malicious tagging.  Reverse engineering of [[ranking function|ranking algorithms]], [[Ad filtering|advertisement blocking]], and [[web content filtering]] may also be considered forms of adversarial data manipulation.<ref>B. Davison, M. Najork, and T. Converse (2006), [http://www.acm.org/sigs/sigir/forum/2006D/2006d_sigirforum_davison.pdf SIGIR Worksheet Report: Adversarial Information Retrieval on the Web (AIRWeb 2006)]</ref>
 Activities intended to poison the supply of useful data make search engines less useful for users. If search engines are more exclusionary they risk becoming more like directories and less dynamic.

Revision as of 21:05, 30 November 2011

Topics

History

See also

References

External links