Jump to content

DBM (computing): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
 
(30 intermediate revisions by 18 users not shown)
Line 1: Line 1:
{{Short description|Key-value database management system}}
{{refimprove|date=April 2018}}
{{About|the family of database engines||DBM (disambiguation){{!}}DBM}}
{{About|the family of database engines||DBM (disambiguation){{!}}DBM}}
{{Use mdy dates|date=June 2016}}
{{refimprove|date=January 2022}}
In computing, a '''DBM''' is a [[Library (computing)|library]] and [[file format]] providing fast, single-keyed access to data. A [[key-value database]] from the original [[Unix]], ''dbm'' is an early example of a [[NoSQL]] system.<ref name="2007-kew-apache"/><ref name="2001-hazel-exim"/><ref name="2001-ladd-odonell-xhtml"/>
In computing, a '''DBM''' is a [[library (computing)|library]] and [[file format]] providing fast, single-keyed access to data. A [[key-value database]] from the original [[Unix]], ''dbm'' is an early example of a [[NoSQL]] system.<ref name="2007-kew-apache"/><ref name="2001-hazel-exim"/><ref name="2001-ladd-odonell-xhtml"/>


==History==
==History==
The original ''dbm'' library and file format was a simple [[database engine]], originally written by [[Ken Thompson]] and released by [[AT&T]] in 1979. The name is a [[three letter acronym]] for ''DataBase Manager'', and can also refer to the family of database engines with APIs and features derived from the original ''dbm''.
The original ''dbm'' library and file format was a simple [[database engine]], originally written by [[Ken Thompson]] and released by [[AT&T]] in 1979. The name is a [[three-letter acronym]] for ''DataBase Manager'', and can also refer to the family of database engines with APIs and features derived from the original ''dbm''.


The ''dbm'' library stores arbitrary data by use of a single key (a [[primary key]]) in fixed-size [[bucket (computing)|buckets]] and uses [[hash function|hashing]] techniques to enable fast retrieval of the data by key.
The ''dbm'' library stores arbitrary data by use of a single key (a [[primary key]]) in fixed-size buckets and uses [[hash function|hashing]] techniques to enable fast retrieval of the data by key.


The hashing scheme used is a form of [[extendible hashing]], so that the hashing scheme expands as new buckets are added to the database, meaning that, when nearly empty, the database starts with one bucket, which is then split when it becomes full. The two resulting child buckets will themselves split when they become full, so the database grows as keys are added.
The hashing scheme used is a form of [[extendible hashing]], so that the hashing scheme expands as new buckets are added to the database, meaning that, when nearly empty, the database starts with one bucket, which is then split when it becomes full. The two resulting child buckets will themselves split when they become full, so the database grows as keys are added.
Line 16: Line 16:
The original AT&T ''dbm'' library has been replaced by its many successor implementations. Notable examples include:<ref name="2001-ladd-odonell-xhtml"/>
The original AT&T ''dbm'' library has been replaced by its many successor implementations. Notable examples include:<ref name="2001-ladd-odonell-xhtml"/>
* ''ndbm'' ("new dbm"), based on the original dbm with some new features.
* ''ndbm'' ("new dbm"), based on the original dbm with some new features.
* ''gdbm'' ("GNU dbm"), [[GNU]] rewrite of the library implementing ''ndbm'' features and its own interface.<ref>{{cite web |title=GDBM |url=https://www.gnu.org.ua/software/gdbm/ |website=www.gnu.org.ua}}</ref>
* [https://www.gnu.org.ua/software/gdbm/gdbm.html GDBM] ("GNU dbm"), [[GNU]] rewrite of the library implementing ''ndbm'' features and its own interface. Also provides new features like crash tolerance for guaranteeing data consistency.<ref>{{cite web |title=Crash Tolerance |website=GDBM manual |url=https://www.gnu.org.ua/software/gdbm/manual/Crash-Tolerance.html |access-date=3 October 2021}}</ref><ref>{{cite web |title=Crashproofing the Original NoSQL Key-Value Store |url=https://queue.acm.org/detail.cfm?id=3487353 |access-date=3 October 2021}}</ref>
* ''sdbm'' ("small dbm"), a [[public domain]] rewrite of ''dbm''. It is a part of the standard distributions for [[Perl]] and [[Ruby_(programming_language)|Ruby]].<ref>{{cite web |last1=yigit |first1=ozan |title=sdbm.bun |url=http://www.cse.yorku.ca/~oz/sdbm.bun |website=cse.yorku.ca |accessdate=8 May 2019}}</ref><ref>{{cite web |title=class SDBM |url=https://docs.ruby-lang.org/en/2.4.0/SDBM.html |website=Documentation for Ruby 2.4.0 |quote=Note that Ruby comes with the source code for SDBM, while the DBM and GDBM standard libraries rely on external libraries and headers.}}</ref>
* ''sdbm'' ("small dbm"), a [[public domain]] rewrite of ''dbm''. It is a part of the standard distribution for [[Perl]] and is available as an external library for [[Ruby_(programming_language)|Ruby]].<ref>{{cite web |last1=yigit |first1=ozan |title=sdbm.bun |website=cse.yorku.ca |url=http://www.cse.yorku.ca/~oz/sdbm.bun |access-date=8 May 2019}}</ref><ref>{{cite web |title=Ruby SDBM library |website=SDBM on Github |url=https://github.com/ruby/sdbm |quote=Note that Ruby used to ship SDBM in the standard distribution up until version 2.7, after which it was made available only as an external library, similarly to the DBM and GDBM libraries, removed from the standard library in Ruby 3.1.}}</ref>
* qdbm, a higher-performance ''dbm'' employing many of the same techniques as Tokyo/Kyoto Cabinet. Written by the same author before they moved on to the cabinets.<ref>{{cite web |title=QDBM: Quick Database Manager |url=https://fallabs.com/qdbm/ |website=fallabs.com |date=2006}}</ref>
* ''qdbm'' ("Quick Database Manager"), a higher-performance ''dbm'' employing many of the same techniques as Tokyo/Kyoto Cabinet. Written by the same author before they moved on to the cabinets.<ref>{{cite web |date=2006 |title=QDBM: Quick Database Manager |website=fallabs.com |url=https://fallabs.com/qdbm/ |access-date=2020-02-27 |archive-date=2020-02-27 |archive-url=https://web.archive.org/web/20200227064151/https://fallabs.com/qdbm/ |url-status=dead }}</ref>
* tdb, a simple database used by [[Samba (software)|Samba]] that supports multiple writers. Has a gdbm-based API.<ref>{{cite web |title=tdb: Main Page |url=https://tdb.samba.org/ |website=tdb.samba.org}}</ref>
* ''tdb'' ("Trivial Database"), a simple database used by [[Samba (software)|Samba]] that supports multiple writers. Has a gdbm-based API.<ref>{{cite web |title=tdb: Main Page |website=tdb.samba.org |url=https://tdb.samba.org/}}</ref>
* [[Berkeley DB]], 1991 replacement of ndbm by [[Sleepycat Software]] (now [[Oracle Corporation|Oracle]]) created to get around the AT&T Unix copyright on [[Berkeley Software Distribution|BSD]]. It features many extensions like parallelism, transactional control, hashing, and B tree storage.
* [[Berkeley DB]], 1991 replacement of ndbm by [[Sleepycat Software]] (now [[Oracle Corporation|Oracle]]) created to get around the AT&T Unix copyright on [[Berkeley Software Distribution|BSD]]. It features many extensions like parallelism, transactional control, hashing, and B-tree storage.
* [[Lightning Memory-Mapped Database|LMDB]]: [[copy-on-write]] [[memory-mapped file|memory-mapped]] [[B+ tree]] implementation in [[C (programming language)|C]] with a Berkeley-style API.
* [[Lightning Memory-Mapped Database|LMDB]]: [[copy-on-write]] [[memory-mapped file|memory-mapped]] [[B+ tree]] implementation in [[C (programming language)|C]] with a Berkeley-style API.


The following databases are dbm-inspired, but they do not directly provide a dbm interface, even though it would be trivial to wrap one:
The following databases are dbm-inspired, but they do not directly provide a dbm interface, even though it would be trivial to wrap one:
* [[Cdb (software)|cdb]] ("constant database"), database by [[Daniel J. Bernstein]], database files can only be created and read, but never be modified
* [[Tokyo Cabinet and Kyoto Cabinet]]: [[C (programming language)|C]] and [[C++]] implementations employing [[hash table]], [[B+ tree]], or fixed-length [[array data structure|array]] structures by FAL Labs. Has improvements to parallelism.<ref> {{ cite web | url = http://fallabs.com/tokyocabinet/spex-ja.html | title = Tokyo Cabinet第1版基本仕様書 | access-date = 25 May 2019 | date = 5 August 2010 | website = Fall Labs | language = ja | trans-title = Fundamental Specifications of Tokyo Cabinet Version 1 | quote = Tokyo CabinetはGDBMやQDBMの後継として次の点を目標として開発されました。これらの目標は達成されており、Tokyo Cabinetは従来のDBMを置き換える製品だと言えます。 | format = html | archive-url = https://web.archive.org/web/20181028124047/http://fallabs.com/tokyocabinet/spex-ja.html | archive-date = 28 October 2018 | df = dmy-all }} </ref>
* [[Tkrzw]], an Apache 2.0 licensed successor to Kyoto Cabinet and Tokyo Cabinet
* [[WiredTiger]]: database with traditional row-oriented and column-oriented storage.
* [[WiredTiger]]: database with traditional row-oriented and column-oriented storage.


== Availability ==
== Availability ==
As of 2001, the ''ndbm'' implementation of DBM was standard on Solaris and IRIX, whereas ''gdbm'' is ubiquitous on [[Linux]]. The Berkeley DB implementations were standard on some free operating systems.<ref name="2001-hazel-exim"/><ref name=fuzz/> After a change of licensing of the BDB to [[GNU AGPL]] in 2013, projects like [[Debian]] have moved to LMDB.<ref name=deb>{{cite mailing list | url=https://lists.debian.org/debian-devel/2014/06/msg00338.html | title=New project goal: Get rid of Berkeley DB (post jessie) | mailinglist=debian-devel | date=June 19, 2014 | author=Ondřej Surý |publisher=[[Debian]]}}</ref>
As of 2001, the ''ndbm'' implementation of DBM was standard on Solaris and IRIX, whereas ''gdbm'' is ubiquitous on [[Linux]]. The Berkeley DB implementations were standard on some free operating systems.<ref name="2001-hazel-exim"/><ref name=fuzz/> After a change of licensing of the Berkeley DB to [[GNU AGPL]] in 2013, projects like [[Debian]] have moved to LMDB.<ref name=deb>{{cite mailing list |last=Surý |first=Ondřej |date=19 June 2014 |title=New project goal: Get rid of Berkeley DB (post jessie) |mailing-list=debian-devel |publisher=[[Debian]] |url=https://lists.debian.org/debian-devel/2014/06/msg00338.html}}</ref>


== Reliability ==
== Reliability ==
A 2018 [[american fuzzy lop (fuzzer)|AFL]] [[fuzzing]] test against many DBM-family databases exposed many problems in implementations when it comes to corrupt or invalid database files. Only [[cdb (software)|''freecdb'']] by [[Daniel J. Bernstein]] showed no crashes. The authors of gdbm, tdb, and lmdb were prompt to respond. Berkeley DB fell behind due to the sheer amount of other issues;<ref name=fuzz>{{cite web |last1=Debroux |first1=Lionel |title=oss-security - Fun with DBM-type databases... |url=https://www.openwall.com/lists/oss-security/2018/06/17/1 |website=openwall.com |date=16 Jun 2018}}</ref> the fixes would be irrelevant to OSS users anyways due to the licensing change locking them back on an old version.<ref name=deb/>
A 2018 [[american fuzzy lop (fuzzer)|AFL]] [[fuzzing]] test against many DBM-family databases exposed many problems in implementations when it comes to corrupt or invalid database files. Only [[cdb (software)|''freecdb'']] by [[Daniel J. Bernstein]] showed no crashes. The authors of gdbm, tdb, and lmdb were prompt to respond. Berkeley DB fell behind due to the sheer amount of other issues;<ref name=fuzz>{{cite web |last=Debroux |first=Lionel |date=16 Jun 2018 |title=oss-security - Fun with DBM-type databases... |website=openwall.com |url=https://www.openwall.com/lists/oss-security/2018/06/17/1}}</ref> the fixes would be irrelevant to open-source software users due to the licensing change locking them back on an old version.<ref name=deb/>


== See also ==
== See also ==
Line 37: Line 38:
* [[Flat file database]]
* [[Flat file database]]
* [[ISAM]]
* [[ISAM]]
* [[Key-value database]]
* [[Key–value database]]
* [[Mobile database]]
* [[Mobile database]]
* [[NoSQL]]
* [[NoSQL]]
Line 51: Line 52:
==Bibliography==
==Bibliography==
{{refbegin}}
{{refbegin}}
*{{cite book |ref=harv |year=2001 |last=Hazel |first=Philip |publisher=O'Reilly |title=Exim: The Mail Transfer Agent |url=https://books.google.co.uk/books?id=AcLDAQAAQBAJ&pg=PT500}}
*{{cite book |last=Hazel |first=Philip |year=2001 |title=Exim: The Mail Transfer Agent |publisher=O'Reilly |url=https://books.google.com/books?id=AcLDAQAAQBAJ&pg=PT500}}
*{{cite book |ref=harv |year=2001 |first1=Eric |last1=Ladd |first2=Jim |last2=O'Donnell |publisher=Que |title=Using XHTML, XML and Java 2: Platinum Edition |isbn=9780789724731 |url=https://books.google.co.uk/books?id=NLc_TQiWvo4C&pg=PA823}}
*{{cite book |last1=Ladd |first1=Eric |last2=O'Donnell |first2=Jim |year=2001 |title=Using XHTML, XML and Java 2: Platinum Edition |publisher=Que |isbn=9780789724731 |url=https://books.google.com/books?id=NLc_TQiWvo4C&pg=PA823}}
*{{cite book |ref=harv |year=2007 |first=Nick |last=Kew |publisher=Prentice Hall Professional |title=The Apache Modules Book: Application Development with Apache |isbn=9780132704502 |url=https://books.google.co.uk/books?id=HTo_AmTpQPMC&pg=PA80}}
*{{cite book |last=Kew |first=Nick |year=2007 |title=The Apache Modules Book: Application Development with Apache |publisher=Prentice Hall Professional |isbn=9780132704502 |url=https://books.google.com/books?id=HTo_AmTpQPMC&pg=PA80}}
{{refend}}
{{refend}}
* [https://apr.apache.org/docs/apr-util/0.9/group__APR__Util__DBM__SDBM.html SDBM library] @[[The Apache Software Foundation|Apache]]
*[https://apr.apache.org/docs/apr-util/0.9/group__APR__Util__DBM__SDBM.html SDBM library] @[[The Apache Software Foundation|Apache]]
* {{cite | url=https://books.google.com/books?id=vvuzDziOMeMC&pg=PT270 | title=Beginning Linux Programming | first1=Neil | last1=Matthew | first2=Richard | last2=Stones | publisher=Wiley | date=2008 | chapter=Databases | ref=none}}
*{{cite book |last1=Matthew |first1=Neil |last2=Stones |first2=Richard |year=2008 |title=Beginning Linux Programming |chapter=Databases |publisher=Wiley |url=https://books.google.com/books?id=vvuzDziOMeMC&pg=PT270}}
* {{cite web|author=Olson, Bostic & Seltzer|title=Berkeley DB|url=http://www.usenix.org/events/usenix99/full_papers/olson/olson.pdf}}
*{{cite web |last1=Olson |first1=Michael A. |last2=Bostic |first2=Keith |last3=Seltzer |first3=Margo |year=1999 |title=Berkeley DB |work=Proceedings of the FREENIX Track:1999 USENIX Annual Technical Conference |url=http://www.usenix.org/events/usenix99/full_papers/olson/olson.pdf}}


[[Category:Database engines]]
[[Category:Database engines]]

Latest revision as of 13:27, 21 August 2024

In computing, a DBM is a library and file format providing fast, single-keyed access to data. A key-value database from the original Unix, dbm is an early example of a NoSQL system.[1][2][3]

History

[edit]

The original dbm library and file format was a simple database engine, originally written by Ken Thompson and released by AT&T in 1979. The name is a three-letter acronym for DataBase Manager, and can also refer to the family of database engines with APIs and features derived from the original dbm.

The dbm library stores arbitrary data by use of a single key (a primary key) in fixed-size buckets and uses hashing techniques to enable fast retrieval of the data by key.

The hashing scheme used is a form of extendible hashing, so that the hashing scheme expands as new buckets are added to the database, meaning that, when nearly empty, the database starts with one bucket, which is then split when it becomes full. The two resulting child buckets will themselves split when they become full, so the database grows as keys are added.

The dbm library and its derivatives are pre-relational databases – they manage associative arrays, implemented as on-disk hash tables. In practice, they can offer a more practical solution for high-speed storage accessed by key, as they do not require the overhead of connecting and preparing queries. This is balanced by the fact that they can generally only be opened for writing by a single process at a time. An agent daemon can handle requests from multiple processes, but introduces IPC overhead.

Implementations

[edit]

The original AT&T dbm library has been replaced by its many successor implementations. Notable examples include:[3]

  • ndbm ("new dbm"), based on the original dbm with some new features.
  • GDBM ("GNU dbm"), GNU rewrite of the library implementing ndbm features and its own interface. Also provides new features like crash tolerance for guaranteeing data consistency.[4][5]
  • sdbm ("small dbm"), a public domain rewrite of dbm. It is a part of the standard distribution for Perl and is available as an external library for Ruby.[6][7]
  • qdbm ("Quick Database Manager"), a higher-performance dbm employing many of the same techniques as Tokyo/Kyoto Cabinet. Written by the same author before they moved on to the cabinets.[8]
  • tdb ("Trivial Database"), a simple database used by Samba that supports multiple writers. Has a gdbm-based API.[9]
  • Berkeley DB, 1991 replacement of ndbm by Sleepycat Software (now Oracle) created to get around the AT&T Unix copyright on BSD. It features many extensions like parallelism, transactional control, hashing, and B-tree storage.
  • LMDB: copy-on-write memory-mapped B+ tree implementation in C with a Berkeley-style API.

The following databases are dbm-inspired, but they do not directly provide a dbm interface, even though it would be trivial to wrap one:

  • cdb ("constant database"), database by Daniel J. Bernstein, database files can only be created and read, but never be modified
  • Tkrzw, an Apache 2.0 licensed successor to Kyoto Cabinet and Tokyo Cabinet
  • WiredTiger: database with traditional row-oriented and column-oriented storage.

Availability

[edit]

As of 2001, the ndbm implementation of DBM was standard on Solaris and IRIX, whereas gdbm is ubiquitous on Linux. The Berkeley DB implementations were standard on some free operating systems.[2][10] After a change of licensing of the Berkeley DB to GNU AGPL in 2013, projects like Debian have moved to LMDB.[11]

Reliability

[edit]

A 2018 AFL fuzzing test against many DBM-family databases exposed many problems in implementations when it comes to corrupt or invalid database files. Only freecdb by Daniel J. Bernstein showed no crashes. The authors of gdbm, tdb, and lmdb were prompt to respond. Berkeley DB fell behind due to the sheer amount of other issues;[10] the fixes would be irrelevant to open-source software users due to the licensing change locking them back on an old version.[11]

See also

[edit]

References

[edit]
  1. ^ Kew 2007, p. 80: "DBMs have been with us since the early days of computing, when the need for fast keyed lookups was recognized. The original DBM is a UNIX-based library and file format for fast, highly-scalable keyed access to data. It was followed (in order) by NDBM ('new DBM'), GDBM ('GNU DBM'), and the Berkeley DB. This last is by far the most advanced, and the only DBM under active development today. Nevertheless, all of the DBMs from NDBM onward provide the same core functionality used by most programs, including Apache. A minimal-implementation SDBM is also bundled with APR, and is available to applications along with the other DBMs.
    Although NDBM is now old - like the city named New Town ('Neapolis') by the Greeks in about 600BC and still called Naples today - it remains the baseline DBM. NDBM was used by early Apache modules such as the Apache 1.x versions of mod_auth_dbm and mod_rewrite. Both GDBM and Berkeley DB provide NDBM emulations, and Linux distributions ship with one or other of these emulations in place of the 'real' NDBM, which is excluded for licensing reasons. Unfortunately, the various file formats are totally incompatible, and there are subtle differences in behaviour concerning database locking. These issues led a steady stream of Linux users to report problems with DBMs in Apache 1.x."
  2. ^ a b Hazel 2001, p. 500: "The most common [single-key] format is called DBM. Most modern versions of Unix have a DBM library installed as standard, though this is not true of some older systems. The two most common DBM libraries are ndbm (standard on Solaris and IRIX) and Berkeley DB Version 2 or 3 (standard on several free operating systems). Exim supports both of these, as well as the older Berkeley DB Version 1, gdbm, and tdb."
  3. ^ a b Ladd & O'Donnell 2001, pp. 823–824: "Most UNIX systems have some kind of DBM database. DBM is a set of library routines that manages data files consisting of key and value pairs. The DBM routines control how users enter and retrieve information from the database. Although it isn't the most powerful mechanism for storing information, using DBM is a faster method of retrieving information than using a flat file. Because most UNIX sites use one of the DBM libraries, the tools you need to store your information in a DBM database are readily available.
    Almost as many flavors of the DBM libraries exist as there are UNIX systems. Although most of these libraries are compatible with each other, they all basically work the same way...
    A list follows of some of the most popular DBM libraries available:
    • DBM - DBM stores the database in two files. The first has the extension .Pag and contains the bitmap. The second, which has the extension .Dir, contains the data.
    • NDBM - NDBM is much like DBM but with a few additional features; it was written to provide better storage and retrieval methods. Also, NDBM enables you to open many databases, unlike DBM, in which you are allowed to have only one database open within your script. Like DBM, NDBM stores its information in two files using the extensions .Pag and .Dir.
    • SDBM - SDBM comes with the Perl archive, which has been ported to many platforms. Therefore, you can use DBM databases as long as a version of Perl exists for your computer. SDBM was written to match the functions provided with NDBM, so portability of code shouldn't be a problem. Perl is available on just about all popular platforms.
    • GDBM - GDBM is the GNU version of the DBM family of database routines. GDBM also enables you to cache data, reducing the time that it takes to write to the database. The database has no size limit; its size depends completely on your system's resources. GDBM database files have the extension .Db. Unlike DBM and NDBM, both of which use two files, GDBM only uses one file.
    • Berkeley db - The Berkeley db expands on the original DBM routines significantly. The Berkeley db uses hashed tables the same as the other DBM databases, but the library also can create databases based on a sorted balanced binary tree (BTREE) and store information with a record line number (RECNO). The method that you use depends completely on how you want to store and retrieve the information from a database. Berkeley db creates only one file, which has no extension."
  4. ^ "Crash Tolerance". GDBM manual. Retrieved 3 October 2021.
  5. ^ "Crashproofing the Original NoSQL Key-Value Store". Retrieved 3 October 2021.
  6. ^ yigit, ozan. "sdbm.bun". cse.yorku.ca. Retrieved 8 May 2019.
  7. ^ "Ruby SDBM library". SDBM on Github. Note that Ruby used to ship SDBM in the standard distribution up until version 2.7, after which it was made available only as an external library, similarly to the DBM and GDBM libraries, removed from the standard library in Ruby 3.1.
  8. ^ "QDBM: Quick Database Manager". fallabs.com. 2006. Archived from the original on 2020-02-27. Retrieved 2020-02-27.
  9. ^ "tdb: Main Page". tdb.samba.org.
  10. ^ a b Debroux, Lionel (16 Jun 2018). "oss-security - Fun with DBM-type databases..." openwall.com.
  11. ^ a b Surý, Ondřej (19 June 2014). "New project goal: Get rid of Berkeley DB (post jessie)". debian-devel (Mailing list). Debian.

Bibliography

[edit]