Jump to content

Apache CouchDB

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 81.178.31.210 (talk) at 15:06, 22 July 2010 (Open Source Projects: clarify to Open source components). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Apache CouchDB
Original author(s)Damien Katz, Jan Lehnardt, Noah Slater, Christopher Lenz, J. Chris Anderson
Developer(s)Apache Software Foundation
Initial release2005
Preview release
1.0 / July 14, 2010 (2010-07-14)
Repository
Written inErlang
Operating systemCross-platform
Available inEnglish
TypeDocument-oriented database
LicenseApache License 2.0
Websitehttp://couchdb.apache.org/

Apache CouchDB, commonly referred to as CouchDB, is a free and open source document-oriented database written in the Erlang programming language. It is a NoSQL solution designed for local replication and to scale vertically along a wide range of devices. CouchDB is supported by commercial enterprises Couchio and Cloudant.

History

In April 2005, Damien Katz (now founder, CEO of Couchio) posted on his blog about a new database engine he was working on. Details were sparse at this early stage, but what he did share was that it would be a "storage system for a large scale object database" and that it would be called CouchDB (Couch is an acronym for cluster of unreliable commodity hardware)[1]. His objectives for the database were for it to become the database of the Internet and that it would be designed from the ground up to serve web applications. CouchDB was originally written in C++, but the project moved to the Erlang OTP platform for its emphasis on fault tolerance. He self-funded the project for almost two years and released it as an open source project under the GNU General Public License.

In February 2008, it became an Apache Incubator project and the license was changed to the Apache License rather than the GPL [2]. On November 2008, it graduated to a top-level project alongside the likes of the Apache HTTP Server, Tomcat and Ant [3].

Now, it is maintained at the Apache Software Foundation with Damien working on it as the lead developer. It quickly drew the attention of IBM, which backed the project, allowing Damien Katz (who previously worked for IBM on Lotus Notes) to work full-time. CouchDB gained a handful of contributors over the last year and has a steadily growing community that helps out with support, testing, bug reports and general discussion about all things CouchDB.

Design

CouchDB is most similar to other document stores like MongoDB and Lotus Notes. It is not a relational database management system. Instead of storing data in rows and columns, the database manages a collection of JSON documents. The documents in a collection need not share a schema, but retain query abilities via views. Views are defined with aggregate functions and filters are computed in parallel, much like MapReduce.

Views are generally stored in the database and their indexes updated continuously, although queries may introduce temporary views. CouchDB supports a view system using external socket servers and a JSON-based protocol.[4] As a consequence, view servers have been developed in a series of languages.

CouchDB exposes a RESTful HTTP API and a large number of pre-written clients are available. Additionally, a plugin architecture allows for using different computer languages as the view server such as JavaScript (default), PHP, Ruby, Python and Erlang. Support for other languages can be easily added. CouchDB design and philosophy borrows heavily from Web architecture and the concepts of resources, methods and representations and can be simplified as the following.

Django may be built for the Web, but CouchDB is built of the Web. I’ve never seen software that so completely embraces the philosophies behind HTTP. CouchDB makes Django look old-school in the same way that Django makes ASP look outdated.

— Jacob Kaplan-Moss, Django Developer [5]

Despite its low version number of 0.11, it is already in use in many software projects and web sites[6], including Ubuntu, where it is used to synchronize address and bookmark data.[7]. Since Version 0.11 CouchDB supports CommonJS' Module specification[8].

Features

  • Document Storage: CouchDB stores documents in their entirety. You can think of a document as one or more field/value pairs expressed as JSON. Field values can be simple things like strings, numbers, or dates. But you can also use ordered lists and associative maps. Every document in a CouchDB database has a unique id and there is no required document schema.
  • ACID Semantics: Like many relational database engines, CouchDB provides ACID semantics[citation needed]. It does this by implementing a form of Multi-Version Concurrency Control (MVCC) not unlike InnoDB or Oracle. That means CouchDB can handle a high volume of concurrent readers and writers without conflict.
  • Map/Reduce Views and Indexes: To provide some structure to the data stored in CouchDB, you can develop views that are similar to their relational database counterparts. In CouchDB, each view is constructed by a JavaScript function (server-side JavaScript by using CommonJS and SpiderMonkey) that acts as the Map half of a MapReduce operation. The function takes a document and transforms it into a single value which it returns. The logic in your JavaScript functions can be arbitrarily complex. Since computing a view over a large database can be an expensive operation, CouchDB can index views and keep those indexes updated as documents are added, removed, or updated. This provides a very powerful indexing mechanism that you get unprecedented control over compared to most databases.
  • Distributed Architecture with Replication: CouchDB was designed with bi-direction replication (or synchronization) and off-line operation in mind. That means multiple replicas can have their own copies of the same data, modify it, and then sync those changes at a later time. The biggest gotcha typically associated with this level of flexibility is conflicts.
  • REST API: CouchDB treats all stored items (there are others besides documents) as a resource. All items have a unique URI that gets exposed via HTTP. REST uses the HTTP methods POST, GET, PUT and DELETE for the four basic CRUD (Create, Read, Update, Delete) operations on all resources. HTTP is widely understood, interoperable, scalable and proven technology. A lot of tools, software and hardware, are available to do all sorts of things with HTTP like caching, proxying and load balancing.

Examples

CouchDB provides a RESTful HTTP methods (e.g., POST, GET, PUT or DELETE) by using the cURL lightweight command-line tool to interact with CouchDB server:

curl http://127.0.0.1:5984/

The CouchDB server processes the HTTP request, it returns a response in JSON as the following:

{"couchdb":"Welcome","version":"0.11.0"}

This is not terribly useful, but it illustrates nicely the way of interacting with CouchDB. Creating a database is simple—just issue the following command:

curl -X PUT http://127.0.0.1:5984/wiki

CouchDB will reply with the following message, if the database does not exist:

{"ok":true}

or, with a different response message, if the database already exists:

{"error":"file_exists","reason":"The database could not be created, the file already exists."}

The command below retrieves information about the database:

curl -X GET http://127.0.0.1:5984/wiki

The server replies with the following JSON message:

{"db_name":"wiki","doc_count":0,"doc_del_count":0,"update_seq":0,
 "purge_seq":0,"compact_running":false,"disk_size":79,
 "instance_start_time":"1272453873691070","disk_format_version":5}

The following command will remove the database and its contents:

curl -X DELETE http://127.0.0.1:5984/wiki

CouchDB will reply with the following message:

{"ok":true}

Open source components

Here a list of the existing open source projects that are used in CouchDB.

Component Description License
SpiderMonkey SpiderMonkey is a code name for the first ever JavaScript engine, written by Brendan Eich at Netscape Communications, later released as open source and now maintained by the Mozilla Foundation. MPL/GPL/LGPL tri-license
jQuery jQuery is a lightweight cross-browser JavaScript library that emphasizes interaction between JavaScript and HTML. Dual license: GPL and MIT
ICU International Components for Unicode (ICU) is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. MIT License
OpenSSL OpenSSL is an open source implementation of the SSL and TLS protocols. The core library (written in the C programming language) implements the basic cryptographic functions and provides various utility functions. Apache-like unique
Erlang Erlang is a general-purpose concurrent programming language and runtime system. The sequential subset of Erlang is a functional language, with strict evaluation, single assignment, and dynamic typing. Modified MPL

See also

References

  1. ^ Lennon, Joe (2009-03-31). "Exploring CouchDB". IBM. IBM. Retrieved 2009-03-31.
  2. ^ Apache mailing list announcement on mail-archives.apache.org
  3. ^ Re: Proposed Resolution: Establish CouchDB TLP on mail-archives.apache.org
  4. ^ View Server Documentation on wiki.apache.org
  5. ^ A Different Way to Model Your Data
  6. ^ CouchDB in the wild A list of software projects and websites using CouchDB
  7. ^ Email from Elliot Murphy (Canonical) to the CouchDB-Devel list
  8. ^ http://wiki.apache.org/couchdb/CommonJS_Modules

Bibliography

  • Erlang eXchange 2008: