Deduping: Difference between revisions

Content deleted Content added

Inline

Revision as of 10:24, 1 October 2006

Deduping means removing duplicate entries in a set. For example, this is a common task when integrating multiple databases or merging datasets. In the case of merging bibliographic data, you would have to compare multiple values that belong to each entry or record to determine if you have duplicates and/or how many duplicates you may have. Some of these values include ISSN, ISBN, Titles, Contributors (authors, editors, publishers), Place of publication, Frequency, Page count, Publication Date(s), etc. This task could be easier depending on the quality of your data. e.g. You may have some records without standard numbers (ISSN, ISBN, etc.) that are duplicates to rows/records of works that do have standard numbers if your practice is not consistent. One way to consider deduping, if your data quality is an issue, is to think of records whose metadata is not necessarily exactly the same, but were intended to be the same (and would be the same if you had data quality standards).

Example: "Mark, do you typically dedupe these lists I send you?"

This computing article is a stub. You can help Wikipedia by expanding it.

External links

Dedupe Software http://www.winpure.com/
Deduplication Software http://www.helpit.com
Project Dedupe http://dedupe.sourceforge.net

Dedupe requires and 'e' at the end. Dedup is incorrect.

@@ Line 8: / Line 8: @@
 * Dedupe Software http://www.winpure.com/
+* Deduplication Software http://www.helpit.com
 * Project Dedupe http://dedupe.sourceforge.net