Jump to content

Adversarial information retrieval: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m History: copyedit, clarity edits, MOS implementation, and/or AWB general fixes using AWB
 
(16 intermediate revisions by 12 users not shown)
Line 1: Line 1:
{{short description|Information retrieval strategies in datasets}}
'''Adversarial information retrieval''' ('''adversarial IR''') is a topic in [[information retrieval]] related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.
'''Adversarial information retrieval''' ('''adversarial IR''') is a topic in [[information retrieval]] related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.


On the Web, the predominant form of such manipulation is [[spamdexing|search engine spamming]] (also known as spamdexing), which involves employing various techniques to disrupt the activity of [[web search engines]], usually for financial gain. Examples of spamdexing are [[Google bomb|link-bombing]], [[comment spam|comment]] or [[referrer spam]], [[spam blog]]s (splogs), malicious tagging. Reverse engineering of [[ranking function|ranking algorithms]], [[Ad filtering|advertisement blocking]], and [[web content filtering]] may also be considered forms of adversarial data manipulation.<ref>B. Davison, M. Najork, and T. Converse (2006), [http://www.acm.org/sigs/sigir/forum/2006D/2006d_sigirforum_davison.pdf SIGIR Worksheet Report: Adversarial Information Retrieval on the Web (AIRWeb 2006)]</ref>
On the Web, the predominant form of such manipulation is [[spamdexing|search engine spamming]] (also known as spamdexing), which involves employing various techniques to disrupt the activity of [[web search engines]], usually for financial gain. Examples of spamdexing are [[Google bomb|link-bombing]], [[comment spam (disambiguation)|comment]] or [[referrer spam]], [[spam blog]]s (splogs), malicious tagging. [[Reverse engineering]] of [[ranking function|ranking algorithms]], [[click fraud]],<ref>Jansen, B. J. (2007) [https://faculty.ist.psu.edu/jjansen/academic/jansen_click_fraud.pdf Click fraud]. IEEE Computer. 40(7), 85-86.</ref> and [[web content filtering]] may also be considered forms of adversarial [[data manipulation]].<ref>B. Davison, M. Najork, and T. Converse (2006), [https://web.archive.org/web/20090320173324/http://www.acm.org/sigs/sigir/forum/2006D/2006d_sigirforum_davison.pdf SIGIR Worksheet Report: Adversarial Information Retrieval on the Web (AIRWeb 2006)]</ref>

Activities intended to poison the supply of useful data make search engines less useful for users. If search engines are more exclusionary they risk becoming more like directories and less dynamic.


== Topics ==
== Topics ==
Topics related to Web spam (spamdexing):
Topics related to Web spam (spamdexing):

* [[Link spam]]
* [[Link spam]]
* [[Keyword spamming]]
* [[Keyword spamming]]
* [[Cloaking]]
* [[Cloaking]]
* Malicious tagging
* Malicious tagging
* Spam related to blogs, including [[comment spam]], [[spam blog|splogs]], and [[sping|ping spam]]
* Spam related to blogs, including [[spam in blogs|comment spam]], [[spam blog|splogs]], and [[sping|ping spam]]


Other topics:
Other topics:

* [[Click fraud]] detection
* [[Click fraud]] detection
* Reverse engineering of a [[search engine]]'s [[ranking]] algorithm
* Reverse engineering of [[search engine]]'s [[ranking]] algorithm
* Web [[content filtering]]
* Web [[content filtering]]
* [[Ad filtering|Advertisement blocking]]
* [[Ad filtering|Advertisement blocking]]
* Stealth [[web crawling|crawling]]
* Stealth [[web crawling|crawling]]
*[[Troll (Internet)]]
* Malicious tagging or voting in [[social networks]]
* Malicious tagging or voting in [[social networks]]
* [[Astroturfing]]
* [[Sockpuppetry]]


== History ==
== History ==
The term "adversarial information retrieval" was first coined in 2000 by [[Andrei Broder]] (then Chief Scientist at [[Alta Vista]]) during the Web plenary session at the [[Text Retrieval Conference|TREC]]-9 conference.<ref>D. Hawking and N. Craswell (2004), [http://es.csiro.au/pubs/trecbook_for_website.pdf Very Large Scale Retrieval and Web Search (Preprint version)] {{Webarchive|url=https://web.archive.org/web/20070829092407/http://es.csiro.au/pubs/trecbook_for_website.pdf |date=2007-08-29 }}</ref>

The term "adversarial information retrieval" was first coined in 2000 by [[Andrei Broder]] (then Chief Scientist at [[Alta Vista]]) during the Web plenary session at the [[Text Retrieval Conference|TREC]]-9 conference.<ref>D. Hawking and N. Craswell (2004), [http://es.csiro.au/pubs/trecbook_for_website.pdf Very Large Scale Retrieval and Web Search (Preprint version)]</ref>


== See also ==
== See also ==
*[[Artificial intelligence content detection]]

*[[Information retrieval]]
*[[Spamdexing]]
*[[Spamdexing]]
*[[Information retrieval]]


== References ==
== References ==
{{reflist|1}}
{{reflist}}


== External links ==
== External links ==
*[http://airweb.cse.lehigh.edu/ AIRWeb]: series of workshops on Adversarial Information Retrieval on the Web
*[http://airweb.cse.lehigh.edu/ AIRWeb]: series of workshops on Adversarial Information Retrieval on the Web
*[http://webspam.lip6.fr/ Web Spam Challenge]: competition for researchers on Web Spam Detection
*[http://webspam.lip6.fr/ Web Spam Challenge]: competition for researchers on Web Spam Detection
*[http://barcelona.research.yahoo.net/webspam/ Web Spam Datasets]: datasets for research on Web Spam Detection
*[https://web.archive.org/web/20100217125910/http://barcelona.research.yahoo.net/webspam/ Web Spam Datasets]: datasets for research on Web Spam Detection


{{DEFAULTSORT:Adversarial Information Retrieval}}
{{DEFAULTSORT:Adversarial Information Retrieval}}
[[Category:Information retrieval]]
[[Category:Information retrieval genres]]
[[Category:Internet fraud]]
[[Category:Internet fraud]]
[[Category:Searching]]

Latest revision as of 00:49, 16 November 2023

Adversarial information retrieval (adversarial IR) is a topic in information retrieval related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.

On the Web, the predominant form of such manipulation is search engine spamming (also known as spamdexing), which involves employing various techniques to disrupt the activity of web search engines, usually for financial gain. Examples of spamdexing are link-bombing, comment or referrer spam, spam blogs (splogs), malicious tagging. Reverse engineering of ranking algorithms, click fraud,[1] and web content filtering may also be considered forms of adversarial data manipulation.[2]

Topics

[edit]

Topics related to Web spam (spamdexing):

Other topics:

History

[edit]

The term "adversarial information retrieval" was first coined in 2000 by Andrei Broder (then Chief Scientist at Alta Vista) during the Web plenary session at the TREC-9 conference.[3]

See also

[edit]

References

[edit]
  1. ^ Jansen, B. J. (2007) Click fraud. IEEE Computer. 40(7), 85-86.
  2. ^ B. Davison, M. Najork, and T. Converse (2006), SIGIR Worksheet Report: Adversarial Information Retrieval on the Web (AIRWeb 2006)
  3. ^ D. Hawking and N. Craswell (2004), Very Large Scale Retrieval and Web Search (Preprint version) Archived 2007-08-29 at the Wayback Machine
[edit]
  • AIRWeb: series of workshops on Adversarial Information Retrieval on the Web
  • Web Spam Challenge: competition for researchers on Web Spam Detection
  • Web Spam Datasets: datasets for research on Web Spam Detection