Jump to content

Search engine (computing): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
How search engines work: Improved the NLS section to better the flow, and added a fact template
Line 12: Line 12:


In the most popular form of search, items are documents or [[web page]]s and the criteria are words or concepts that the documents may contain<ref>Voorhees, E.M. [http://www.nist.gov/itl/iad/894.02/works/papers/nlp_ir.ps Natural Language Processing and Information Retrieval]. National Institute of Standards and Technology. March 2000.</ref>.
In the most popular form of search, items are documents or [[web page]]s and the criteria are words or concepts that the documents may contain<ref>Voorhees, E.M. [http://www.nist.gov/itl/iad/894.02/works/papers/nlp_ir.ps Natural Language Processing and Information Retrieval]. National Institute of Standards and Technology. March 2000.</ref>.

There are several varieties of [[syntax]] in which a search engine user can express a query. Some methods are ''formalized'' and require a strict, logical and algebraic syntax. Other approaches are less strict and allow for a less defined query. One form of a less-restricted query syntax is referred to as '''Natural Language Search''', which is a term typically used to describe web search engines that apply [[natural language]] processing of some form. For example, instead of searching for one or two words, a query could consist of an English sentence or paragraph. A natural language search engine will then [[parse]] the query into words and evaluate searches for these words. This places less burden on the search engine user to formulate a specific query using restrictive, and sometimes difficult to learn, syntax. A second definition of natural language search engines reflects how the search engine performs indexing, unrelated to the query syntax.

Traditional search engines tend to use a non-linguistic model of language and the hypothesis is that NLS will provide better results - that is to say, results that more accurately and efficiently support a user's need{{fact}}.



===Ranking===
===Ranking===
Line 26: Line 31:


The metadata collected about each item is typically stored on a [[computer]] in the form of an [[index]]. The index typically requires a smaller amount of [[computer storage]] and provides a way for the search engine to calculate the relevance, or similarity, between the query and the set of items.
The metadata collected about each item is typically stored on a [[computer]] in the form of an [[index]]. The index typically requires a smaller amount of [[computer storage]] and provides a way for the search engine to calculate the relevance, or similarity, between the query and the set of items.

===Natural Language Search===
'''Natural Language Search''' is the term used to describe web search engines that apply [[natural language]] processing of some form. Traditional search engines tend to use a non-linguistic model of language and the hypothesis is that NLS will provide better results - that is to say, results that more accurately and efficiently support a user's need.


== References ==
== References ==

Revision as of 16:55, 5 September 2007

A search engine is an information retrieval system designed to help find information stored on a computer system. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload.

The most popular form of a search engine is a Web search engine which searches for information on the public World Wide Web. Other kinds of search engines include enterprise search engines, which search on intranets, personal search engines, and mobile search engines.

How search engines work

Querying

Search engines provide an interface to a group of items that enables users to specify criteria about an item of interest and have the engine find the matching items within the group.

In the most popular form of search, items are documents or web pages and the criteria are words or concepts that the documents may contain[1].

There are several varieties of syntax in which a search engine user can express a query. Some methods are formalized and require a strict, logical and algebraic syntax. Other approaches are less strict and allow for a less defined query. One form of a less-restricted query syntax is referred to as Natural Language Search, which is a term typically used to describe web search engines that apply natural language processing of some form. For example, instead of searching for one or two words, a query could consist of an English sentence or paragraph. A natural language search engine will then parse the query into words and evaluate searches for these words. This places less burden on the search engine user to formulate a specific query using restrictive, and sometimes difficult to learn, syntax. A second definition of natural language search engines reflects how the search engine performs indexing, unrelated to the query syntax.

Traditional search engines tend to use a non-linguistic model of language and the hypothesis is that NLS will provide better results - that is to say, results that more accurately and efficiently support a user's need[citation needed].


Ranking

A Boolean search for an item within a group of items will either return the exact matching item or nothing. This is a rather orthodox search method where the equality between the desired item and the actual item must be exact. In application, it is sometimes far more beneficial and useful to incorporate a more lax measure of similarity between the desired item(s) and the items that exist in the group being searched.

For example, instead of finding only the exact book in a library, a library search engine may return a list of 'similar' books, with the exact book listed first.

The list of items that meet the criteria specified by the query are typically sorted, or ranked, in some regard so as to place the most 'relevant' items first. Placing the most relevant items first reduces the time required by users to determine whether one or more of the resulting items are sufficiently similar to the query. It has become common knowledge through the use of Web search engines that the further down the list of matching items you browse, the less relevant the items become.

Indexing

To provide a set of matching items quickly, a search engine will typically collect information, or metadata, about the group of items under consideration beforehand. For example, a library search engine may determine the author of each book automatically and add the author name to a description of each book. Users can then search for books by the author's name. Other metadata in this example might include the book title, the number of pages in the book, the date it was published, and so forth.

The metadata collected about each item is typically stored on a computer in the form of an index. The index typically requires a smaller amount of computer storage and provides a way for the search engine to calculate the relevance, or similarity, between the query and the set of items.

References

  1. ^ Voorhees, E.M. Natural Language Processing and Information Retrieval. National Institute of Standards and Technology. March 2000.

See also