Deduping: Difference between revisions
Line 8: | Line 8: | ||
* Dedupe Software http://www.winpure.com/ |
* Dedupe Software http://www.winpure.com/ |
||
* Deduplication Software http://www.helpit.com |
|||
* Project Dedupe http://dedupe.sourceforge.net |
* Project Dedupe http://dedupe.sourceforge.net |
||
Revision as of 10:24, 1 October 2006
Deduping means removing duplicate entries in a set. For example, this is a common task when integrating multiple databases or merging datasets. In the case of merging bibliographic data, you would have to compare multiple values that belong to each entry or record to determine if you have duplicates and/or how many duplicates you may have. Some of these values include ISSN, ISBN, Titles, Contributors (authors, editors, publishers), Place of publication, Frequency, Page count, Publication Date(s), etc. This task could be easier depending on the quality of your data. e.g. You may have some records without standard numbers (ISSN, ISBN, etc.) that are duplicates to rows/records of works that do have standard numbers if your practice is not consistent. One way to consider deduping, if your data quality is an issue, is to think of records whose metadata is not necessarily exactly the same, but were intended to be the same (and would be the same if you had data quality standards).
Example: "Mark, do you typically dedupe these lists I send you?"
External links
- Dedupe Software http://www.winpure.com/
- Deduplication Software http://www.helpit.com
- Project Dedupe http://dedupe.sourceforge.net
Dedupe requires and 'e' at the end. Dedup is incorrect.