Jump to content

Temporal database

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Vk2010 (talk | contribs) at 18:12, 11 March 2012 (Implementations in databases). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A temporal database is a database with built-in time aspects, for example a temporal data model and a temporal version of Structured Query Language (SQL).

More specifically the temporal aspects usually include valid-time and transaction-time. These attributes go together to form bitemporal data.

  • Valid time denotes the time period during which a fact is true with respect to the real world.
  • Transaction time is the time period during which a fact is stored in the database.
  • Bitemporal data combines both Valid and Transaction Time.

Note that these two time periods do not have to be the same for a single fact. Imagine that we come up with a temporal database storing data about the 18th century. The valid time of these facts is somewhere between 1701 and 1800, whereas the transaction time starts when we insert the facts into the database, for example, January 21, 1998.

It is possible to have timelines other than Valid Time and Transaction Time, such as Decision Time, in the database. In that case the database would be called a multitemporal database as opposed to a bitemporal database. However, this approach introduces additional complexities such as dealing with the validity of (foreign) keys.

History

The history of temporal databases is synchronous with the history of databases itself. With the development of SQL and its attendant use in reallife applications, people realized that when they added date columns to key fields, some issues arose. The basic issue is this: if you have a primary key and some attributes in the table, and you add a date to the primary key to track historical changes, you can suddenly give out the original key over and over again. Deletes get different meaning. And so forth. In 1992, this issue was recognized but standard database theory was not yet up to resolving this issue, and neither was the then newly formalized SQL-92 standard.

Richard Snodgrass proposed in 1992 that temporal extensions to SQL be developed by the temporal database community. In response to this proposal, a virtual committee was formed to design extensions to the 1992 edition of the SQL standard (ANSI X3.135.-1992 and ISO/IEC 9075:1992); those extensions, known as TSQL2, were developed during 1993 by this committee meeting only via email.[1] In late 1993,Snodgrass first presented this work to the group responsible for the American National Standard for Database Language SQL, ANSI Technical Committee X3H2 (now known as NCITS H2). The preliminary language specification appeared in the March 1994 ACM SIGMOD Record. Based on responses to that specification, changes were made to the language, and the definitive version of the TSQL2 Language Specification was published in September, 1994 [2]

An attempt was made to incorporate parts of TSQL2 into the new SQL standard SQL:1999, called SQL3. Parts of TSQL2 were included in a new substandard of SQL3, ISO/IEC 9075-7, called SQL/Temporal.[1] However, the ISO project responsible for temporal support was canceled near the end of 2001.

The ideas and concepts described in the TSQL2 specification, such as Valid Time, Transaction Time and Bitemporal tables, have all found their way into the general literature on temporal databases since then. In 2002 Chris Date, Hugh Darwen and Nikos Lorentzo presented in their book Temporal Data & the Relational Model a treatment of the topic that includes many of the terms introduced by TSQL2 but also introduces the Sixth normal form to solve some of the issues.

Example

For illustration, we will take data from the following short biography of a fictional man John Doe. John Doe was born on April 3, 1975 in the Kids Hospital of Medicine County, as son of Jack Doe and Jane Doe who lived in Smallville. Jack Doe proudly registered the birth of his first-born on April 4, 1975 at the Smallville City Hall. John grew up as a joyful boy, turned out to be a brilliant student and graduated with honors in 1993. After graduation he went to live on his own in Bigtown. Although he moved out on August 26, 1994, he forgot to register the change of address officially. It was only at the turn of the seasons that his mother reminded him that he had to register, which he did a few days later on December 27, 1994. Although John had a promising future, his story ends tragically. John Doe was accidentally hit by a truck on April 1, 2001. The coroner reported his date of death on the very same day.

Using a Standard Database

In order to store the life of John Doe in a (non-temporal) database table we use this table Person (Name, Address). In order to simplify we define Name as the primary key of Person.

John's father officially reported birth on April 4, 1975. This means that a Smallville official inserted the following entry in the database on this date: Person (John Doe, Smallville) Note that the date itself is not stored in the database.

After graduation John moves out, but forgets to register his new address. John's entry in the database is not changed until December 27, 1994, when he finally enters Bigtown's city hall. A Bigtown official updates his address in the database. The Person table now contains Person (John Doe, Bigtown) Note that the information of John living in Smallville has been overwritten. There is no way to retrieve that information from the database. Any official accessing the database on December 28, 1994 would be told that John lives in Bigtown. More technically: if a computer scientist ran the query SELECT ADDRESS FROM PERSON WHERE NAME='John Doe' on December 26, 1994, the result would be: Smallville. Running the same query 2 days later would result in Bigtown.

Until his death the database would state that he lived in Bigtown. On April 1, 2001 the coroner deletes the John Doe entry from the database. Running the above query would return no result at all.

Date What happened in the real world Database Action What the database shows
April 3, 1975 John is born Nothing There is no person called John Doe
April 4, 1975 John's father officially reports John's birth Inserted:Person(John Doe, Smallville) John Doe lives in Smallville
August 26, 1994 After graduation, John moves to Bigtown, but forgets to register his new address Nothing John Doe lives in Smallville
December 26, 1994 Nothing Nothing John Doe lives in Smallville
December 27, 1994 John registers his new address Updated:Person(John Doe, Bigtown) John Doe lives in Bigtown
April 1, 2001 John dies Deleted:Person(John Doe) There is no person called John Doe

Bitemporal Relations

A bi-temporal relation contains both valid and transaction time. This is good because it provides both temporal rollback and historical information. Temporal rollback (e.g.: "In 1992, where did the database believe John lived?") is provided by the transaction time. Historical information (e.g.: "Where did John live in 1992?") can be derived from valid time. The answers to these example questions may not be identical - the database may have been altered since 1992, causing the queries to produce different results.

Valid Time

Valid time is the time for which a fact is true in the real world. In the example above, the Person table gets two extra fields, Valid-From and Valid-To, specifying when a person's address was valid in the real world. On April 4, 1975 John's father proudly registered his son's birth. An official will then insert a new entry to the database stating that John lives in Smallville from April, 3rd. Notice that although the data was inserted on the 4th, the database states that the information is valid since the 3rd. The official does not yet know if or when John will ever move to a better place so in the database the Valid-To is filled with infinity (∞). Resulting in this entry in the database:

Person(John Doe, Smallville, 3-Apr-1975, ∞).

December 27, 1994 John reports his new address in Bigtown where he has been living since August 26, 1994. The Bigtown official does not change the address of the current entry of John Doe in the database. He adds a new one:

Person (John Doe, Big Town, 26-Aug-1994, ∞).

The original entry Person (John Doe, Smallville, 3-Apr-1975, ∞) is then updated (not removed!). Since it is now known that John stopped living in Smallville on August 26, 1994 the Valid-To entry can be filled in. The database now contains two entries for John Doe

Person(John Doe, Smallville, 3-Apr-1975, 26-Aug-1994).
Person(John Doe, Bigtown, 26-Aug-1994, ∞).

When John dies the database is once more updated. The current entry will be updated stating that John does not live in the Bigtown any longer. No new entry is being added because officials never report hell as a new address. The database now looks like this

Person(John Doe, Smallville, 3-Apr-1975, 26-Aug-1994).
Person(John Doe, Bigtown, 26-Aug-1994, 1-Apr-2001).

Transaction Time

Transaction time is the time a transaction was made. This enables queries that show the state of the database at a given time. Two more fields are added to the Person table: Transaction-From and Transaction-To. Transaction-From is the time a transaction was made, and Transaction-To is the time that the transaction was superseded (or infinity if it has not yet been superseded).

What happens if the person's address as stored in the database is incorrect? Suppose an official accidentally entered the wrong address or date? Or, suppose the person lied about their address for some reason. Upon discovery of the fact, the officials go back and update the database.

For example, from 1-Jun-1995 to 3-Sep-2000 John Doe moved to Beachy. But, to avoid paying Beachy's exorbitant residence tax, he never reported it to the authorities. Later, it is discovered on 2-Feb-2001, during a tax investigation that he was in fact in Beachy during these dates, so they update the database as follows:

Person(John Doe, Bigtown, 26-Aug-1994, 1-Jun-1995).
Person(John Doe, Beachy, 1-Jun-1995, 3-Sep-2000).
Person(John Doe, Bigtown, 3-Sep-2000, 1-Apr-2001).

So the existing record about John living in Bigtown is split into two separate records and a new record is inserted recording his residence in Beachy.

However, this leaves no record that the database ever claimed that he lived in Bigtown during 1-Jun-1995 to 3-Sep-2000. Which might be important for say auditing reasons (or to use as evidence in the official's tax investigation.) This is where transaction time comes in. We record in each record when it was entered and when it was superseded. Thus we get something like this:

Person(John Doe, Smallville, 3-Apr-1975,  ∞,           4-Apr-1975,  27-Dec-1994).
Person(John Doe, Smallville, 3-Apr-1975,  26-Aug-1994, 27-Dec-1994, ∞          ).
Person(John Doe, Bigtown,    26-Aug-1994, ∞,           27-Dec-1994, 2-Feb-2001 ).
Person(John Doe, Bigtown,    26-Aug-1994, 1-Jun-1995,  2-Feb-2001,  ∞          ).
Person(John Doe, Beachy,     1-Jun-1995,  3-Sep-2000,  2-Feb-2001,  ∞          ).
Person(John Doe, Bigtown,    3-Sep-2000,  ∞,           2-Feb-2001,  1-Apr-2001 ).
Person(John Doe, Bigtown,    3-Sep-2000,  1-Apr-2001,  1-Apr-2001,  ∞          ).

So we record not only changes in what happened at different times, but also changes in what was officially recorded at different times.

A particularly challenging issue is the support of temporal queries in a transaction time database under evolving schema. In order to achieve perfect archival quality it is of key importance to store the data under the schema version under which they firstly appeared. However even the most simple temporal query rewriting the history of an attribute value would be required to be manually rewritten under each of the schema versions, potentially hundreds as in the case of MediaWiki [1] This process would be particularly taxing for users. A common solution is to provide automatic query rewriting.[3][4]

Implementations in databases

The following implementations implement a bitemporal database in a relational database management system (RDBMS).

  • Oracle Workspace Manager Workspace Manager, a feature of Oracle Database, enables application developers and DBAs to manage current, proposed and historical versions of data in the same database. The latest version complies with TSQL2.
  • TimeDB TimeDB is a free temporal relational DBMS by TimeConsult. It runs as a frontend to Oracle that accepts TSQL2 statements and generates SQL92 statements.
  • PostgreSQL PostgreSQL has an open-source contributed package that can be installed in the database to manage temporal data. The function reference is here.
  • Teradata version 13.10 and Teradata version 14 has temporal features built into the database.
  • Anchor Modeling emulates temporal features and automates the implementation in databases that lack support.
  • IBM DB2 10 for z/OS has temporal features built into the database.

See also

References

  1. ^ a b Snodgrass, 1999, p. 9
  2. ^ Richard T. Snodgrass. "TSQL2 Temporal Query Language". www.cs.arizona.edu. Computer Science Department of the University of Arizona. Retrieved 14 July 2009.
  3. ^ Hyun J. Moon, Carlo A. Curino, Alin Deutsch, C.-Y. Hou, and Carlo Zaniolo (2008). Managing and querying transaction-time databases under schema evolution. Very Large Data Base VLDB.{{cite conference}}: CS1 maint: multiple names: authors list (link)
  4. ^ Hyun J. Moon, Carlo A. Curino, and Carlo Zaniolo (2010). Scalable Architecture and Query Optimization for Transaction-time DBs with Evolving Schemas. SIGMOD.{{cite conference}}: CS1 maint: multiple names: authors list (link)
  • C.J. Date, Hugh Darwen, Nikos Lorentzos (2002). Temporal Data & the Relational Model, First Edition (The Morgan Kaufmann Series in Data Management Systems); Morgan Kaufmann; 1st edition; 422 pages. ISBN 1-55860-855-9.
  • Joe Celko (2005). Joe Celko's SQL for Smarties: Advanced SQL Programming (The Morgan Kaufmann Series in Data Management); Morgan Kaufmann; 3rd edition; 808 pages. ISBN 0-12-369379-9.—Chapters 4 and 29 in particular discuss temporal issues.
  • Snodgrass, Richard T. (1999). Template:PDFlink (Morgan Kaufmann Series in Data Management Systems); Morgan Kaufmann; 504 pages; ISBN 1-55860-436-7

Further reading