Jump to content

Third normal form: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
See also: Added 4NF, 5NF, 6NF
Tags: Mobile edit Mobile web edit
 
(105 intermediate revisions by 72 users not shown)
Line 1: Line 1:
{{Short description|Normalizing a database design to reduce the duplication of data and ensure referential integrity}}
'''Third normal form''' is a [[Database normalization#Normal forms|normal form]] that is used in [[Database normalisation|normalizing]] a [[database]] design to reduce the duplication of data and ensure [[referential integrity]] by ensuring that (1) the entity is in [[second normal form]], and (2) all the attributes in a table are determined only by the [[candidate key|candidate keys]] of that table and not by any non-prime attributes. 3NF was designed to improve data base processing while minimizing storage costs. 3NF data modeling was ideal for online transaction processing (OLTP) applications with heavy order entry type of needs. <ref>[http://www.techopedia.com/definition/22561/third-normal-form-3nf "What is Third Normal Form?"] Cory Janssen, Technopedia, retrieved 24 April 2014</ref>
'''Third normal form''' ('''3NF''') is a [[database schema]] design approach for [[relational database]]s which uses [[Database normalization|normalizing]] principles to reduce the duplication of data, avoid [[data anomaly|data anomalies]], ensure [[referential integrity]], and simplify [[data management]]. It was defined in 1971 by [[Edgar F. Codd]], an English computer scientist who invented the [[relational model]] for [[database]] management.


A [[Relation (database)|database relation]] (e.g. a [[Table (database)|database table]]) is said to meet third normal form standards if all the attributes (e.g. [[Column (database)|database columns]]) are [[Functional dependency|functionally dependent]] on solely a [[Candidate key|key]], except the case of functional dependency whose right hand side is a prime attribute (an attribute which is strictly included into some key). Codd defined this as a relation in [[second normal form]] where all non-prime attributes depend only on the [[candidate key]]s and do not have a [[transitive dependency]] on another key.<ref>Codd, E. F. "Further Normalization of the Data Base Relational Model", p. 34.</ref>
==Definition of third normal form==
The third normal form (3NF) is a [[Database normalization#Normal forms|normal form]] used in [[database normalization]]. 3NF was originally defined by [[E.F. Codd]] in 1971.<ref name="Codd">Codd, E.F. "Further Normalization of the Data Base Relational Model." (Presented at Courant Computer Science Symposia Series 6, "Data Base Systems," New York City, May 24th–25th, 1971.) IBM Research Report RJ909 (August 31st, 1971). Republished in Randall J. Rustin (ed.), ''Data Base Systems: Courant Computer Science Symposia Series 6''. Prentice-Hall, 1972.</ref> Codd's definition states that a table is in 3NF [[if and only if]] both of the following conditions hold:


A hypothetical example of a failure to meet third normal form would be a hospital database having a table of patients which included a column for the telephone number of their doctor. (The phone number is dependent on the doctor, rather than the patient, thus would be better stored in a table of doctors.) The negative outcome of such a design is that a doctor's number will be duplicated in the database if they have multiple patients, thus increasing both the chance of input error and the cost and risk of updating that number should it change (compared to a third normal form-compliant data model that only stores a doctor's number once on a doctor table).
* The [[Relation (database)|relation]] R (table) is in [[second normal form]] (2NF)
* Every non-prime attribute of R is non-transitively dependent on every key of R.


Codd later realized that 3NF did not eliminate all undesirable data anomalies and developed a stronger version to address this in 1974, known as [[Boyce–Codd normal form]].
A ''non-prime attribute'' of R is an attribute that does not belong to any [[candidate key]] of R.<ref name="Codd2">Codd, p. 43.</ref> A [[transitive dependency]] is a [[functional dependency]] in which ''X'' → ''Z'' (''X'' determines ''Z'') indirectly, by virtue of ''X'' → ''Y'' and ''Y'' → ''Z'' (where it is not the case that ''Y'' → ''X'').<ref>Codd, p. 45&ndash;46.</ref>


==Definition of third normal form==
A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and only if, for each of its functional dependencies ''X'' → ''A'', at least one of the following conditions holds:
The third normal form (3NF) is a [[Database normalization#Normal forms|normal form]] used in [[database normalization]]. 3NF was originally defined by [[E. F. Codd]] in 1971.<ref name="Codd">Codd, E. F. "Further Normalization of the Data Base Relational Model". (Presented at Courant Computer Science Symposia Series 6, "Data Base Systems", New York City, May 24–25, 1971.) IBM Research Report RJ909 (August 31, 1971). Republished in Randall J. Rustin (ed.), ''Data Base Systems: Courant Computer Science Symposia Series 6''. Prentice-Hall, 1972.</ref>


Codd's definition states that a table is in 3NF [[if and only if]] both of the following conditions hold:
* ''X'' contains ''A'' (that is, ''X'' → ''A'' is trivial functional dependency), or
* The [[Relation (database)|relation]] R (table) is in [[second normal form]] (2NF).
* ''X'' is a [[superkey]], or
* No non-prime attribute of R is transitively dependent on the primary key.
* Every element of ''A''-''X'', the set difference between A and X, is a ''prime attribute'' (i.e., each attribute in ''A''-''X'' is contained in some [[candidate key]])<ref name="Zaniolo">Zaniolo, Carlo. "A New Normal Form for the Design of Relational Database Schemata." ''ACM Transactions on Database Systems'' 7(3), September 1982.</ref><ref>Abraham Silberschatz, Henry F. Korth, S. Sudarshan, ''[http://www.db-book.com/ Database System Concepts] (5th edition), p. 276-277</ref>{{Request quotation|date=November 2010}}


A ''non-prime attribute'' of R is an attribute that does not belong to any [[candidate key]] of R.<ref name="Codd2">Codd, p. 43.</ref> A [[transitive dependency]] is a [[functional dependency]] in which ''X'' → ''Z'' (''X'' determines ''Z'') indirectly, by virtue of ''X'' → ''Y'' and ''Y'' → ''Z'' (where it is not the case that ''Y'' → ''X'').<ref>Codd, p. 45–46.</ref>
Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent [[Boyce–Codd normal form]] (BCNF). BCNF simply eliminates the third alternative ("Every element of A-X, the set difference between A and X, is a prime attribute").


A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and only if for each of its functional dependencies ''X'' → ''Y'', at least one of the following conditions holds:<ref name="Zaniolo">Zaniolo, Carlo. "A New Normal Form for the Design of Relational Database Schemata". ''ACM Transactions on Database Systems'' 7(3), September 1982.</ref><ref>[[Abraham Silberschatz]], [[Henry F. Korth]], S. Sudarshan, ''[http://www.db-book.com/ Database System Concepts]'' (5th edition), p. 276–277.</ref>{{Request quotation|date=November 2010}}
=="Nothing but the key"==
* ''X'' contains ''Y'' (that is, ''Y'' is a subset of ''X'', meaning ''X'' → ''Y'' is a trivial functional dependency),
An approximation of Codd's definition of 3NF, paralleling the traditional [[sworn testimony|pledge]] to give true evidence in a court of law, was given by Bill Kent: "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key."<ref name="Kent">Kent, William. [http://www.bkent.net/Doc/simple5.htm "A Simple Guide to Five Normal Forms in Relational Database Theory"], ''Communications of the ACM'' 26 (2), Feb. 1983, pp. 120–125.</ref> A common variation supplements this definition with the oath: "so help me [[Edgar F. Codd|Codd]]".<ref name="Diehr">The author of a 1989 book on database management credits one of his students with coming up with the "so help me Codd" addendum. Diehr, George. ''Database Management'' (Scott, Foresman, 1989), p. 331.</ref>
* ''X'' is a [[superkey]],
* every element of ''Y'' \ ''X'', the [[Complement (set theory)#Relative complement|set difference]] between Y and X, is a ''prime attribute'' (i.e., each attribute in ''Y'' \ ''X'' is contained in some [[candidate key]]).


To rephrase Zaniolo's definition more simply, the relation is in 3NF if and only if for every non-trivial functional dependency X → Y, X is a superkey or ''Y'' \ ''X'' consists of prime attributes. Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent [[Boyce–Codd normal form]] (BCNF). BCNF simply eliminates the third alternative ("Every element of ''Y'' \ ''X'', the set difference between ''Y'' and ''X'', is a prime attribute.").
Requiring existence of "the key" ensures that the table is in [[First normal form|1NF]]; requiring that non-key attributes be dependent on "the whole key" ensures [[Second normal form|2NF]]; further requiring that non-key attributes be dependent on "nothing but the key" ensures 3NF. While this phrase is a useful mnemonic, the fact that it only mentions a single key means it defines some necessary but not sufficient conditions to satisfy the 2nd and 3rd Normal Forms. Both 2NF and 3NF are concerned equally with ''all'' candidate keys of a table and not just any one key.

=="Nothing but the key"==
An approximation of Codd's definition of 3NF, paralleling the traditional [[sworn testimony|oath]] to give true evidence in a court of law, was given by Bill Kent: "[every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key".<ref name="Kent">Kent, William. [http://www.bkent.net/Doc/simple5.htm "A Simple Guide to Five Normal Forms in Relational Database Theory"], ''Communications of the ACM'' 26 (2), Feb. 1983, pp. 120–125.</ref> A common variation supplements this definition with the oath "so help me [[Edgar F. Codd|Codd]]".<ref name="Diehr">The author of a 1989 book on database management credits one of his students with coming up with the "so help me Codd" addendum. Diehr, George. ''Database Management'' (Scott, Foresman, 1989), p. 331.</ref>


Requiring existence of "the key" ensures that the table is in [[First normal form|1NF]]; requiring that non-key attributes be dependent on "the whole key" ensures [[Second normal form|2NF]]; further requiring that non-key attributes be dependent on "nothing but the key" ensures 3NF. While this phrase is a useful mnemonic, the fact that it only mentions a single key means it defines some [[Necessity and sufficiency|necessary but not sufficient conditions]] to satisfy the 2nd and 3rd normal forms. Both 2NF and 3NF are concerned equally with ''all'' candidate keys of a table and not just any one key.
[[Christopher J. Date|Chris Date]] refers to Kent's summary as "an intuitively attractive characterization" of 3NF, and notes that with slight adaptation it may serve as a definition of the slightly stronger [[Boyce–Codd normal form]]: "Each attribute must represent a fact about the key, the whole key, and nothing but the key."<ref name="DateIntro">Date, C.J. ''An Introduction to Database Systems'' (7th ed.) (Addison Wesley, 2000), p. 379.</ref> The 3NF version of the definition is weaker than Date's BCNF variation, as the former is concerned only with ensuring that ''non-key'' attributes are dependent on keys. Prime attributes (which are keys or parts of keys) must not be functionally dependent at all; they each represent a fact about the key in the sense of providing part or all of the key itself. (It should be noted here that this rule applies only to functionally dependent attributes, as applying it to all attributes would implicitly prohibit composite candidate keys, since each part of any such key would violate the "whole key" clause.)


[[Christopher J. Date|Chris Date]] refers to Kent's summary as "an intuitively attractive characterization" of 3NF and notes that with slight adaptation it may serve as a definition of the slightly stronger [[Boyce–Codd normal form]]: "Each attribute must represent a fact about the key, the whole key, and nothing but the key."<ref name="DateIntro">Date, C. J. ''An Introduction to Database Systems'' (7th ed.) (Addison Wesley, 2000), p. 379.</ref> The 3NF version of the definition is weaker than Date's BCNF variation, as the former is concerned only with ensuring that ''non-key'' attributes are dependent on keys. Prime attributes (which are keys or parts of keys) must not be functionally dependent at all; they each represent a fact about the key in the sense of providing part or all of the key itself. (This rule applies only to functionally dependent attributes, as applying it to all attributes would implicitly prohibit composite candidate keys, since each part of any such key would violate the "whole key" clause.)
An example of a 2NF table that fails to meet the requirements of 3NF is:


An example of a table that fails to meet the requirements of 3NF is:
{| class="wikitable"
{| class="wikitable"
|+ Tournament Winners
|+ Tournament winners
! <u>Tournament</u> !! <u>Year</u> !! Winner !! Winner Date of Birth
! <u>Tournament</u> !! <u>Year</u> !! Winner !! Winner's date of birth
|-
|-
|Indiana Invitational||1998||Al Fredrickson||21 July 1975
|Indiana Invitational||1998||Al Fredrickson||21 July 1975
Line 39: Line 45:
|}
|}


Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the composite key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a candidate key for the table.
Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the [[composite key]] {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a candidate key for the table.


The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the candidate key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records.
The breach of 3NF occurs because the non-prime attribute (Winner's date of birth) is transitively dependent on the candidate key {Tournament, Year} through the non-prime attribute Winner. The fact that Winner's date of birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records.


In order to express the same facts without violating 3NF, it is necessary to split the table into two:
In order to express the same facts without violating 3NF, it is necessary to split the table into two:
{| style="border-spacing:2em 0; margin-left:-2em"

{|
| valign="top" |
| valign="top" |
{| class="wikitable"
{| class="wikitable"
|+ Tournament Winners
|+ Tournament winners
! <u>Tournament</u> !! <u>Year</u> !! Winner
! <u>Tournament</u> !! <u>Year</u> !! Winner
|-
|-
Line 61: Line 66:
| valign="top" |
| valign="top" |
{| class="wikitable"
{| class="wikitable"
|+ Winner Dates of Birth
|+ Winner's dates of birth
! <u>Winner</u> !! Date of Birth
! <u>Winner</u> !! Date of birth
|-
|-
|Chip Masterson||14 March 1977
|Chip Masterson||14 March 1977
Line 72: Line 77:
|}
|}


Update anomalies cannot occur in these tables.{{Clarify|date=March 2015}}
Update anomalies cannot occur in these tables, because unlike before, Winner is now a candidate key in the second table, thus allowing only one value for Date of birth for each Winner.

== Computation ==
A relation can always be decomposed in third normal form, that is, the relation R is rewritten to [[relational projection|projections]] R<sub>1</sub>, ..., R<sub>n</sub> whose [[natural join|join]] is equal to the original relation. Further, this decomposition does not lose any [[functional dependencies|functional dependency]], in the sense that every functional dependency on R can be derived from the functional dependencies that hold on the projections R<sub>1</sub>, ..., R<sub>n</sub>. What is more, such a decomposition can be computed in [[polynomial time]].<ref>[[Serge Abiteboul]], Richard B. Hull, [[Victor Vianu]]: Foundations of Databases. Addison-Wesley, 1995. http://webdam.inria.fr/Alice/ {{ISBN|0201537710}}. Theorem 11.2.14.</ref>

To decompose a relation into 3NF from 2NF, break the table into the [[canonical cover]] functional dependencies, then create a relation for every candidate key of the original relation which was not already a subset of a relation in the decomposition.<ref>{{Cite web |last=Hammo |first=Bassam |title=Decomposition, 3NF, BCNF |url=https://faculty.ksu.edu.sa/sites/default/files/E-%20Decomposition.pdf |archive-url=https://web.archive.org/web/20230315013047/https://faculty.ksu.edu.sa/sites/default/files/E-%20Decomposition.pdf |archive-date=2023-03-15 |url-status=live }}</ref>


==Derivation of Zaniolo's conditions==
==Equivalence of the Codd and Zaniolo definitions of 3NF==
The definition of 3NF offered by Carlo Zaniolo in 1982, and given above, is proven in the following way: Let X → A be a nontrivial [[functional dependency|FD]] (i.e. one where X does not contain A) and let A be a non-key attribute. Also let Y be a key of R. Then Y → X.
The definition of 3NF offered by Carlo Zaniolo in 1982, and given above, can be shown to be equivalent to the Codd definition in the following way: Let X → A be a nontrivial [[functional dependency|FD]] (i.e. one where X does not contain A) and let A be a non-prime attribute. Also let Y be a candidate key of R. Then Y → X. Therefore, A is not transitively dependent on Y if there is a functional dependency X → Y iff X is a superkey of R.


==Normalization beyond 3NF==
==Normalization beyond 3NF==
Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF tables, rarely met with in practice, are affected by such anomalies; these are tables which either fall short of [[Boyce–Codd normal form]] (BCNF) or, if they meet BCNF, fall short of the higher normal forms [[Fourth normal form|4NF]] or [[Fifth normal form|5NF]].
Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF tables, rarely met with in practice, are affected by such anomalies; these are tables which either fall short of [[Boyce–Codd normal form]] (BCNF) or, if they meet BCNF, fall short of the higher normal forms [[Fourth normal form|4NF]] or [[Fifth normal form|5NF]].


==Considerations for Use in Reporting Environments==
==Considerations for use in reporting environments==
While 3NF was ideal for machine processing, the segmented nature of the data model was difficult to consume by a human user. Analytics via query, reporting, and dashboards required a different type of data model that supported analysis such as trend lines, period-to-date calculations (month-to-date, quarter-to-date, year-to-date), cumulative calculations, basic statistics (average, standard deviation, moving averages) and previous period comparisons (year ago, month ago, week ago) e.g. [[dimensional modeling]] and beyond dimensional modeling, flattening of stars via [[Hadoop]] and [[data science]].<ref>[http://roelantvos.com/blog/?p=740].</ref><ref>[https://infocus.emc.com/william_schmarzo/hadoop-data-modeling-lessons-vin-diesel/].</ref>
While 3NF was ideal for machine processing, the segmented nature of the data model can be difficult to intuitively consume by a human user. [[Analytics]] via query, reporting, and dashboards were often facilitated by a different type of data model that provided pre-calculated analysis such as trend lines, period-to-date calculations (month-to-date, quarter-to-date, year-to-date), cumulative calculations, basic statistics (average, standard deviation, moving averages) and previous period comparisons (year ago, month ago, week ago) e.g. [[dimensional modeling]] and beyond dimensional modeling, flattening of stars via [[Hadoop]] and [[data science]].<ref>{{cite news |title=Comparisons between Data Warehouse modelling techniques – Roelant Vos |newspaper=Roelant Vos |date=12 February 2013 |url=http://roelantvos.com/blog/?p=740 |access-date=5 March 2018}}</ref><ref>{{cite web |date=23 September 2014 |title=Hadoop Data Modeling Lessons {{!}} EMC |website=InFocus Blog {{!}} Dell EMC Services |url=https://infocus.dellemc.com/william_schmarzo/hadoop-data-modeling-lessons-vin-diesel/|access-date=5 March 2018}}</ref> Hadley Wickham's "tidy data" framework is 3NF, with "the constraints framed in statistical language".<ref>{{Cite journal |last=Wickham |first=Hadley |date=2014-09-12 |title=Tidy Data |url=https://doi.org/10.18637/jss.v059.i10 |journal=Journal of Statistical Software |language=en |volume=59 |pages=1–23 |doi=10.18637/jss.v059.i10 |issn=1548-7660|doi-access=free }}</ref>


==See also==
==See also==
*[[Attribute-value system]]
*[[Attribute-value system]]
*[[First normal form]] (1NF)
*[[Second normal form]] (2NF)
*[[Fourth normal form]] (4NF)
*[[Fifth normal form]] (5NF)
*[[Sixth normal form]] (6NF)


==References==
==References==
Line 91: Line 106:
==Further reading==
==Further reading==
{{Refbegin}}
{{Refbegin}}
*Date, C. J. (1999), ''[http://www.aw-bc.com/catalog/academic/product/0,1144,0321197844,00.html An Introduction to Database Systems]'' (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
*Date, C. J. (1999), ''[https://web.archive.org/web/20050404010227/http://www.aw-bc.com/catalog/academic/product/0,1144,0321197844,00.html An Introduction to Database Systems]'' (8th ed.). Addison-Wesley Longman. {{ISBN|0-321-19784-4}}.
*Kent, W. (1983) ''[http://www.bkent.net/Doc/simple5.htm A Simple Guide to Five Normal Forms in Relational Database Theory]'', Communications of the ACM, vol. 26, pp.&nbsp;120–126
*Kent, W. (1983) ''[http://www.bkent.net/Doc/simple5.htm A Simple Guide to Five Normal Forms in Relational Database Theory]'', Communications of the ACM, vol. 26, pp.&nbsp;120–126
{{Refend}}
{{Refend}}
Line 99: Line 114:
*[http://databases.about.com/od/specificproducts/a/normalization.htm Database Normalization Basics] by Mike Chapple (About.com)
*[http://databases.about.com/od/specificproducts/a/normalization.htm Database Normalization Basics] by Mike Chapple (About.com)
*[http://mikehillyer.com/articles/an-introduction-to-database-normalization/ An Introduction to Database Normalization] by Mike Hillyer.
*[http://mikehillyer.com/articles/an-introduction-to-database-normalization/ An Introduction to Database Normalization] by Mike Hillyer.
*[http://phlonx.com/resources/nf3/ A tutorial on the first 3 normal forms] by Fred Coulson
*[https://phlonx.com/resources/nf3/ A tutorial on the first 3 normal forms] by Fred Coulson
*[http://support.microsoft.com/kb/283878 Description of the database normalization basics] by Microsoft
*[http://support.microsoft.com/kb/283878 Description of the database normalization basics] by Microsoft
*[http://exploredatabase.blogspot.in/2014/02/third-normal-form-3nf-with-example.html Third Normal Form with Simple Examples] by exploreDatabase
*[http://exploredatabase.com/2014/02/third-normal-form-3nf-with-example.html Third Normal Form with Simple Examples] by exploreDatabase


{{Database normalization}}
{{Database normalization}}

Latest revision as of 00:18, 22 December 2024

Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was defined in 1971 by Edgar F. Codd, an English computer scientist who invented the relational model for database management.

A database relation (e.g. a database table) is said to meet third normal form standards if all the attributes (e.g. database columns) are functionally dependent on solely a key, except the case of functional dependency whose right hand side is a prime attribute (an attribute which is strictly included into some key). Codd defined this as a relation in second normal form where all non-prime attributes depend only on the candidate keys and do not have a transitive dependency on another key.[1]

A hypothetical example of a failure to meet third normal form would be a hospital database having a table of patients which included a column for the telephone number of their doctor. (The phone number is dependent on the doctor, rather than the patient, thus would be better stored in a table of doctors.) The negative outcome of such a design is that a doctor's number will be duplicated in the database if they have multiple patients, thus increasing both the chance of input error and the cost and risk of updating that number should it change (compared to a third normal form-compliant data model that only stores a doctor's number once on a doctor table).

Codd later realized that 3NF did not eliminate all undesirable data anomalies and developed a stronger version to address this in 1974, known as Boyce–Codd normal form.

Definition of third normal form

[edit]

The third normal form (3NF) is a normal form used in database normalization. 3NF was originally defined by E. F. Codd in 1971.[2]

Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:

A non-prime attribute of R is an attribute that does not belong to any candidate key of R.[3] A transitive dependency is a functional dependency in which XZ (X determines Z) indirectly, by virtue of XY and YZ (where it is not the case that YX).[4]

A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and only if for each of its functional dependencies XY, at least one of the following conditions holds:[5][6][need quotation to verify]

  • X contains Y (that is, Y is a subset of X, meaning XY is a trivial functional dependency),
  • X is a superkey,
  • every element of Y \ X, the set difference between Y and X, is a prime attribute (i.e., each attribute in Y \ X is contained in some candidate key).

To rephrase Zaniolo's definition more simply, the relation is in 3NF if and only if for every non-trivial functional dependency X → Y, X is a superkey or Y \ X consists of prime attributes. Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent Boyce–Codd normal form (BCNF). BCNF simply eliminates the third alternative ("Every element of Y \ X, the set difference between Y and X, is a prime attribute.").

"Nothing but the key"

[edit]

An approximation of Codd's definition of 3NF, paralleling the traditional oath to give true evidence in a court of law, was given by Bill Kent: "[every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key".[7] A common variation supplements this definition with the oath "so help me Codd".[8]

Requiring existence of "the key" ensures that the table is in 1NF; requiring that non-key attributes be dependent on "the whole key" ensures 2NF; further requiring that non-key attributes be dependent on "nothing but the key" ensures 3NF. While this phrase is a useful mnemonic, the fact that it only mentions a single key means it defines some necessary but not sufficient conditions to satisfy the 2nd and 3rd normal forms. Both 2NF and 3NF are concerned equally with all candidate keys of a table and not just any one key.

Chris Date refers to Kent's summary as "an intuitively attractive characterization" of 3NF and notes that with slight adaptation it may serve as a definition of the slightly stronger Boyce–Codd normal form: "Each attribute must represent a fact about the key, the whole key, and nothing but the key."[9] The 3NF version of the definition is weaker than Date's BCNF variation, as the former is concerned only with ensuring that non-key attributes are dependent on keys. Prime attributes (which are keys or parts of keys) must not be functionally dependent at all; they each represent a fact about the key in the sense of providing part or all of the key itself. (This rule applies only to functionally dependent attributes, as applying it to all attributes would implicitly prohibit composite candidate keys, since each part of any such key would violate the "whole key" clause.)

An example of a table that fails to meet the requirements of 3NF is:

Tournament winners
Tournament Year Winner Winner's date of birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977

Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the composite key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a candidate key for the table.

The breach of 3NF occurs because the non-prime attribute (Winner's date of birth) is transitively dependent on the candidate key {Tournament, Year} through the non-prime attribute Winner. The fact that Winner's date of birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records.

In order to express the same facts without violating 3NF, it is necessary to split the table into two:

Tournament winners
Tournament Year Winner
Indiana Invitational 1998 Al Fredrickson
Cleveland Open 1999 Bob Albertson
Des Moines Masters 1999 Al Fredrickson
Indiana Invitational 1999 Chip Masterson
Winner's dates of birth
Winner Date of birth
Chip Masterson 14 March 1977
Al Fredrickson 21 July 1975
Bob Albertson 28 September 1968

Update anomalies cannot occur in these tables, because unlike before, Winner is now a candidate key in the second table, thus allowing only one value for Date of birth for each Winner.

Computation

[edit]

A relation can always be decomposed in third normal form, that is, the relation R is rewritten to projections R1, ..., Rn whose join is equal to the original relation. Further, this decomposition does not lose any functional dependency, in the sense that every functional dependency on R can be derived from the functional dependencies that hold on the projections R1, ..., Rn. What is more, such a decomposition can be computed in polynomial time.[10]

To decompose a relation into 3NF from 2NF, break the table into the canonical cover functional dependencies, then create a relation for every candidate key of the original relation which was not already a subset of a relation in the decomposition.[11]

Equivalence of the Codd and Zaniolo definitions of 3NF

[edit]

The definition of 3NF offered by Carlo Zaniolo in 1982, and given above, can be shown to be equivalent to the Codd definition in the following way: Let X → A be a nontrivial FD (i.e. one where X does not contain A) and let A be a non-prime attribute. Also let Y be a candidate key of R. Then Y → X. Therefore, A is not transitively dependent on Y if there is a functional dependency X → Y iff X is a superkey of R.

Normalization beyond 3NF

[edit]

Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF tables, rarely met with in practice, are affected by such anomalies; these are tables which either fall short of Boyce–Codd normal form (BCNF) or, if they meet BCNF, fall short of the higher normal forms 4NF or 5NF.

Considerations for use in reporting environments

[edit]

While 3NF was ideal for machine processing, the segmented nature of the data model can be difficult to intuitively consume by a human user. Analytics via query, reporting, and dashboards were often facilitated by a different type of data model that provided pre-calculated analysis such as trend lines, period-to-date calculations (month-to-date, quarter-to-date, year-to-date), cumulative calculations, basic statistics (average, standard deviation, moving averages) and previous period comparisons (year ago, month ago, week ago) e.g. dimensional modeling and beyond dimensional modeling, flattening of stars via Hadoop and data science.[12][13] Hadley Wickham's "tidy data" framework is 3NF, with "the constraints framed in statistical language".[14]

See also

[edit]

References

[edit]
  1. ^ Codd, E. F. "Further Normalization of the Data Base Relational Model", p. 34.
  2. ^ Codd, E. F. "Further Normalization of the Data Base Relational Model". (Presented at Courant Computer Science Symposia Series 6, "Data Base Systems", New York City, May 24–25, 1971.) IBM Research Report RJ909 (August 31, 1971). Republished in Randall J. Rustin (ed.), Data Base Systems: Courant Computer Science Symposia Series 6. Prentice-Hall, 1972.
  3. ^ Codd, p. 43.
  4. ^ Codd, p. 45–46.
  5. ^ Zaniolo, Carlo. "A New Normal Form for the Design of Relational Database Schemata". ACM Transactions on Database Systems 7(3), September 1982.
  6. ^ Abraham Silberschatz, Henry F. Korth, S. Sudarshan, Database System Concepts (5th edition), p. 276–277.
  7. ^ Kent, William. "A Simple Guide to Five Normal Forms in Relational Database Theory", Communications of the ACM 26 (2), Feb. 1983, pp. 120–125.
  8. ^ The author of a 1989 book on database management credits one of his students with coming up with the "so help me Codd" addendum. Diehr, George. Database Management (Scott, Foresman, 1989), p. 331.
  9. ^ Date, C. J. An Introduction to Database Systems (7th ed.) (Addison Wesley, 2000), p. 379.
  10. ^ Serge Abiteboul, Richard B. Hull, Victor Vianu: Foundations of Databases. Addison-Wesley, 1995. http://webdam.inria.fr/Alice/ ISBN 0201537710. Theorem 11.2.14.
  11. ^ Hammo, Bassam. "Decomposition, 3NF, BCNF" (PDF). Archived (PDF) from the original on 2023-03-15.
  12. ^ "Comparisons between Data Warehouse modelling techniques – Roelant Vos". Roelant Vos. 12 February 2013. Retrieved 5 March 2018.
  13. ^ "Hadoop Data Modeling Lessons | EMC". InFocus Blog | Dell EMC Services. 23 September 2014. Retrieved 5 March 2018.
  14. ^ Wickham, Hadley (2014-09-12). "Tidy Data". Journal of Statistical Software. 59: 1–23. doi:10.18637/jss.v059.i10. ISSN 1548-7660.

Further reading

[edit]
[edit]