Jump to content

Book scanning: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
removed extra word
Line 1: Line 1:
'''Book scanning''' is the process of converting physical [[book]]s into [[e-book|electronic books]] (e-books) via image scanning. This is a much less time-intensive method than re-typing all of the text; before scanning became feasible, re-typing was generally the only option. For physical books to be scanned into e-books, they must be scanned and then have [[optical character recognition]] (OCR) or similar methods applied to make the images into text. Alternatively, the books can be stored in an image-type format like
'''Book scanning''' is the process of converting physical [[book]]s into [[e-book|electronic books]] (e-books) via image scanning. This is a much less time-intensive method than re-typing all of the text; before scanning became feasible, re-typing was generally the only option. For physical books to be scanned into e-books, they must be scanned and then have [[optical character recognition]] (OCR) or similar methods applied to make the images into text. Alternatively, the books can be stored in an image-type format like
[[DjVu]], [[Portable Document Format]] (PDF) or [[Tagged Image File Format]] (TIFF) and [[Adobe Reader]].
[[DjVu]], [[Portable Document Format]] (PDF) or [[Tagged Image File Format]] (TIFF).


==Commercial book scanners==
==Commercial book scanners==

Revision as of 17:09, 21 August 2007

Book scanning is the process of converting physical books into electronic books (e-books) via image scanning. This is a much less time-intensive method than re-typing all of the text; before scanning became feasible, re-typing was generally the only option. For physical books to be scanned into e-books, they must be scanned and then have optical character recognition (OCR) or similar methods applied to make the images into text. Alternatively, the books can be stored in an image-type format like DjVu, Portable Document Format (PDF) or Tagged Image File Format (TIFF).

Commercial book scanners

Sketch of a typical manual book scanner

Commercial book scanners are not like normal scanners; these book scanners are usually a high quality Digital camera with light sources on either side of the camera mounted on some sort of frame to provide easy access for a person or machine to flip the pages of the book. Some models (e.g. BookDrive DIY scanner) involve V-shaped book cradles, which provide support for book spines and also center book position automatically.

The advantage of this type of scanner is that it is very fast, compared to the productivity of overhead scanners. Compared with traditional overhead scanners whose prices normally start from USD10,000 upwards, this type of digital camera-based book scanner is much less expensive.

Book scanning by organisations on a large scale

Projects like Project Gutenberg, Google Book Search, and the Open Content Alliance scan books on a large scale.

One of the main challenges to this is the sheer volume of books that must be scanned, expected to be in the tens of millions. All of these must be scanned and then made searchable online for the public to use as a universal library. Currently, there are 3 main ways that large organizations are relying on: outsourcing, scanning in house using commercial book scanners, and scanning in house using robotic scanning solutions.

As for outsourcing, books are often shipped to be scanned by low-cost sources such as India or China. Alternatively, due to convenience, safety and technology improvement, many organizations choose to scan in-house by using either overhead scanners which are time-consuming, or digital camera-based scanning solutions which are substantially faster, and is a method employed by Internet Archive as well as Google. Other less popular methods are by robots to flip book pages as well as cutting off the book's spine and scanning the pages in a scanner with automatic page-feeding capability.

Once it is scanned, the data is either entered manually or via OCR, another major cost of the book scanning projects.

Due to copyright issues, most scanned books are those that are out of copyright; however, Google Book Search is known to scan books still protected under copyright unless the publisher specifically excludes them.

See also