Jump to content

Web crawling

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 62.254.64.12 (talk) at 03:07, 17 January 2005. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Web Crawling is a fundamental procedure, of the internet, by which a software is specifically designed to extract information from web-pages. The first web-crawlers were search engines, whose sole job is to jump from link to link on web-pages, "crawling" meaning to extract information on the web-page as it goes along. This information is normally, the title/description of the pages along with kwy phrases in the web-page. This extracted information is passed to a global database index, which can be used to search content on web-pages. Search engines coined the term "web crawling" in that context. These days, "web scraping" means the same think, but more likely in the context of extraction specific information from a web-page on a regular basis.