Database normalization: Difference between revisions
Simplify introduction for readability, place TOC in more prominent possition |
|||
Line 28: | Line 28: | ||
*'''Candidate key''': A [[candidate key]] is a minimal superkey, that is, a superkey for which we can say that no proper subset of it is also a superkey. {Employee Id, Skill} would be a candidate key for the "Employees' Skills" table. |
*'''Candidate key''': A [[candidate key]] is a minimal superkey, that is, a superkey for which we can say that no proper subset of it is also a superkey. {Employee Id, Skill} would be a candidate key for the "Employees' Skills" table. |
||
*'''Non-prime attribute''': A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table. |
*'''Non-prime attribute''': A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table. |
||
*'''Primary key''': Most [[database management system|DBMSs]] require a table to be defined as having a single unique key, rather than a number of possible unique keys. [[primary key]] is a candidate key which the database designer has designated for this purpose |
*'''Primary key''': Most [[database management system|DBMSs]] require a table to be defined as having a single unique key, rather than a number of possible unique keys. [[primary key]] is a candidate key which the database designer has designated for this purpose. |
||
==History== |
==History== |
Revision as of 21:51, 19 July 2007
Database normalization is a technique for designing relational database tables. Tables can be normalized to various degrees. Database theory describes a table's degree of normalization in terms of normal forms. Each normal form must comply with lower forms rules.
More highly normalized tables reduce data duplication and opportunities for various kinds of logical inconsistencies that could lead to loss of integrity of the database. They greatly simplify development, maintenance, and expandability of the database. Higher degrees of normalization typically involve more tables and create the need for a larger number of joins, which can reduce performance. As a result, more highly normalized tables are typically used for databases involving many insolated transactions (such as an automatic teller system), while less normalized tables are used for read-mostly information (such as reports).
Although the normal forms are often defined informally in terms of the characteristics of tables, rigorous definitions of the normal forms are concerned with the characteristics of mathematical constructs known as relations. Whenever information is represented relationally it should be considered to what extent the representation is normalized.
Problems addressed by normalization
A table that is not sufficiently normalized can suffer from logical inconsistencies of various types, and from anomalies involving data operations. In such a table:-
- The same information can be expressed on multiple records; therefore updates to the table may result in logical inconsistencies. For example, each record in an unnormalized "Employees' Skills" table might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular employee will potentially need to be applied to multiple records (one for each of his skills). If the update is not carried through successfully—if, that is, the employee's address is updated on some records but not others—then the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular employee's address is. This phenomenon is known as an update anomaly.
- There are circumstances in which certain facts cannot be recorded at all. In the above example, if it is the case that Employee Address is held only in the "Employees' Skills" table, then we cannot record the address of an employee whose skills are not yet known. This phenomenon is known as an insertion anomaly.
- There are circumstances in which the deletion of data representing certain facts necessitates the deletion of data representing completely different facts. For example, suppose a table has the attributes Student ID, Course ID, and Lecturer ID (a given student is enrolled in a given course, which is taught by a given lecturer). If in the early stages of enrolment the number of students on the course temporarily drops to zero, then the last of the records referencing that course must be deleted—meaning, as a side-effect, that the table no longer tells us which lecturer has been assigned to teach the course. This phenomenon is known as a deletion anomaly.
Ideally, a relational database table should be designed in such a way as to exclude the possibility of update, insertion, and deletion anomalies. The normal forms of relational database theory provide guidelines for deciding whether a particular design will be vulnerable to such anomalies. It is possible to correct an unnormalized design so as to make it adhere to the demands of the normal forms: this is called normalization.
Normalization typically involves decomposing an unnormalized table into two or more tables that, were they to be combined (joined), would convey exactly the same information as the original table.
Background to normalization: definitions
- Functional dependency: Attribute B has a functional dependency on attribute A if, for each value of attribute A, there is exactly one value of attribute B. For example, Employee Address has a functional dependency on Employee ID, because a particular Employee Address value corresponds to every Employee ID value. An attribute may be functionally dependent either on a single attribute or on a combination of attributes. It is not possible to determine the extent to which a design is normalized without understanding what functional dependencies apply to the attributes within its tables; understanding this, in turn, requires knowledge of the problem domain.
- Trivial functional dependency: A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} → {Employee Address} is trivial, as is {Employee Address} → {Employee Address}.
- Full functional dependency: An attribute is fully functionally dependent on a set of attributes X if it is a) functionally dependent on X, and b) not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, for it is also dependent on {Employee ID}.
- Transitive dependency: A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.
- Multivalued dependency: A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows: see the Multivalued Dependency article for a rigorous definition.
- Join dependency: A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.
- Superkey: A superkey is an attribute or set of attributes that uniquely identifies rows within a table; in other words, two distinct rows are always guaranteed to have distinct superkeys. {Employee ID, Employee Address, Skill} would be a superkey for the "Employees' Skills" table; {Employee ID, Skill} would also be a superkey.
- Candidate key: A candidate key is a minimal superkey, that is, a superkey for which we can say that no proper subset of it is also a superkey. {Employee Id, Skill} would be a candidate key for the "Employees' Skills" table.
- Non-prime attribute: A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table.
- Primary key: Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. primary key is a candidate key which the database designer has designated for this purpose.
History
This section needs expansion. You can help by adding to it. |
Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form:
There is, in fact, a very simple elimination[1] procedure which we shall call normalization. Through decomposition non-simple domains are replaced by "domains whose elements are atomic (non-decomposable) values."
— Edgar F. Codd, A Relational Model of Data for Large Shared Data Banks[2]
In his paper, Edgar F. Codd used the term "non-simple" domains to describe a heterogeneous data structure, but later researchers would refer to such a structure as an abstract data type.
Normal forms
The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is to such inconsistencies and anomalies. Each table has a "highest normal form" (HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its HNF.
The normal forms are applicable to individual tables; to say that an entire database is in normal form n is to say that all of its tables are in normal form n.
Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion, i.e. a 1NF design is first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of how normalization typically works. A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the designer, because 3NF tables usually need no modification to meet the requirements of these higher normal forms.
Edgar F. Codd originally defined the first three normal forms (1NF, 2NF, and 3NF). These normal forms have been summarized as requiring that all non-key attributes be dependent on "the key, the whole key and nothing but the key". The fourth and fifth normal forms (4NF and 5NF) deal specifically with the representation of many-to-many and one-to-many relationships among attributes. Sixth normal form (6NF) incorporates considerations relevant to temporal databases.
First normal form
- The criteria for first normal form (1NF) are:
- A table must be guaranteed not to have any duplicate records; therefore it must have at least one candidate key.
- Every column must be atomic, i.e. single-valued with respect to its datatype. In other words, a column may represent exactly one member from its domain. For example, a date column carrying two dates is a 1NF violation. On the other hand, a datatype may be arbitrarily complex. Therefore, a hypothetical date-range datatype might indeed carry two dates (or rather, one date range) without violating 1NF.
- Sometimes this second requirement is expressed as "there may not be repeating groups", leading to some prevalent misconceptions. The first misconception is that 1NF precludes a series of columns repeating the same domain. The second misconception is that 1NF does not allow embedded lists. These are perhaps examples of poor design, but not necessarily 1NF violations:
Recipe ID Ingredient 1 Ingredient 2 Ingredient 3 1 Flour Eggs Milk 2 Parsely Sage Rosemary 3 Flour Eggs Milk
Recipe ID Ingredients 1 flour,eggs,milk 2 parsely,sage,rosemary 3 flour,eggs,milk
- Realize that relational databases are incapable of such things,but here's a depiction of a true 1NF violation, nonetheless:
Recipe ID Ingredient 1 Ingredient 2 Ingredient 3 1 3
Flour Eggs Milk 2 Parsely Sage Rosemary
Second normal form
- The criteria for second normal form (2NF) are:
- The table must be in 1NF.
- None of the non-prime attributes of the table are functionally dependent on a part (proper subset) of a candidate key; in other words, all functional dependencies of non-prime attributes on candidate keys are full functional dependencies. For example, in an "Employees' Skills" table whose attributes are Employee ID, Employee Address, and Skill, the combination of Employee ID and Skill uniquely identifies records within the table. Given that Employee Address depends on only one of those attributes – namely, Employee ID – the table is not in 2NF.
- Note that if none of a 1NF table's candidate keys are composite – i.e. every candidate key consists of just one attribute – then we can say immediately that the table is in 2NF.
Third normal form
- The criteria for third normal form (3NF) are:
- The table must be in 2NF.
- Every non-prime attribute of the table must be non-transitively dependent on every candidate key. A violation of 3NF would mean that at least one non-prime attribute is only indirectly dependent (transitively dependent) on a candidate key. For example, consider a "Departments" table whose attributes are Department ID, Department Name, Manager ID, and Manager Hire Date; and suppose that each manager can manage one or more departments. {Department ID} is a candidate key. Although Manager Hire Date is functionally dependent on the candidate key {Department ID}, this is only because Manager Hire Date depends on Manager ID, which in turn depends on Department ID. This transitive dependency means the table is not in 3NF.
Boyce-Codd normal form
- The criteria for Boyce-Codd normal form (BCNF) are:
- The table must be in 3NF.
- Every non-trivial functional dependency must be a dependency on a candidate key.
Fourth normal form
- The criteria for fourth normal form (4NF) are:
- The table must be in BCNF.
- There must be no non-trivial multivalued dependencies on something other than a candidate key. A BCNF table is said to be in 4NF if and only if all of its multivalued dependencies are functional dependencies.
Fifth normal form
- The criteria for fifth normal form (5NF and also PJ/NF) are:
- The table must be in 4NF.
- There must be no non-trivial join dependencies that do not follow from the key constraints. A 4NF table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.
Domain/key normal form
- Domain/key normal form (or DKNF) requires that a table not be subject to any constraints other than domain constraints and key constraints.
Sixth normal form
This normal form was, as of 2005, only recently proposed: the sixth normal form (6NF) was only defined when extending the relational model to take into account the temporal dimension. Unfortunately, most current SQL technologies as of 2005 do not take into account this work, and most temporal extensions to SQL are not relational. See work by Date, Darwen and Lorentzos[3] for a relational temporal extension, [4],for further discussion on Temporal Aggregation in SQL, or see TSQL2 for a different approach.
Example Of The Process
This 6th May 2007 may be confusing or unclear to readers. |
The following example illustrates how a database designer might employ his knowledge of the normal forms to make progressive improvements to an initially unnormalized database design. The example is somewhat contrived: in practice, few designs lend themselves to being normalized in strict stages in which the HNF increases at each stage.
The database in the example captures information about the suppliers with which various companies' divisions have relationships – more specifically, it captures information about the types of parts which each division of each company sources from its suppliers.
Starting Point
Information has been presented initially in a way that does not even meet 1NF. Every record is for a particular Company/Division combination: for each of these combinations, repeating groups of part- and supplier-related information occur. 1NF does not permit repeating groups.
Company | Company Founder | Company Logo |
Division | Part Type | Supplier | Supplier Country |
Supplier Continent |
---|---|---|---|---|---|---|---|
Allied Clock and Watch | Horace Washington | Sundial | Clocks | Spring Pendulum Spring Toothed Wheel |
Tensile Globodynamics Tensile Globodynamics Pieza de Acero Pieza de Acero |
USA USA Mexico Mexico |
N. Amer. N. Amer. N. Amer. N. Amer. |
Allied Clock and Watch | Horace Washington | Sundial | Watches | Quartz Crystal Tuning Fork Battery |
Microflux Microflux Dakota Electrics |
Belgium Belgium USA |
Europe Europe N. Amer. |
Global Robot | Nils Neumann | Gearbox | Industrial Robots | Flywheel Axle Axle Mechanical Arm |
Wheels 4 Less Wheels 4 Less TransEuropa TransEuropa |
USA USA Italy Italy |
N. Amer. N. Amer. Europe Europe |
Global Robot | Nils Neumann | Gearbox | Domestic Robots | Artificial Brain Artificial Brain Metal Housing Backplate |
Prometheus Labs Frankenstein Labs Pieza de Acero Pieza de Acero |
Luxembourg Germany Mexico Mexico |
Europe Europe N. Amer. N. Amer. |
1NF
We eliminate the repeating groups by ensuring that each group appears on its own record. The unique identifier for a record is now {Company, Division, Part Type, Supplier}.
Company | Company Founder | Company Logo |
Division | Part Type | Supplier | Supplier Country |
Supplier Continent |
---|---|---|---|---|---|---|---|
Allied Clock and Watch | Horace Washington | Sundial | Clocks | Spring | Tensile Globodynamics | USA | N. Amer. |
Allied Clock and Watch | Horace Washington | Sundial | Clocks | Pendulum | Tensile Globodynamics | USA | N. Amer. |
Allied Clock and Watch | Horace Washington | Sundial | Clocks | Spring | Pieza de Acero | Mexico | N. Amer. |
Allied Clock and Watch | Horace Washington | Sundial | Clocks | Toothed Wheel | Pieza de Acero | Mexico | N. Amer. |
Allied Clock and Watch | Horace Washington | Sundial | Watches | Quartz Crystal | Microflux | Belgium | Europe |
Allied Clock and Watch | Horace Washington | Sundial | Watches | Tuning Fork | Microflux | Belgium | Europe |
Allied Clock and Watch | Horace Washington | Sundial | Watches | Battery | Dakota Electrics | USA | N. Amer. |
Global Robot | Nils Neumann | Gearbox | Industrial Robots | Flywheel | Wheels 4 Less | USA | N. Amer. |
Global Robot | Nils Neumann | Gearbox | Industrial Robots | Axle | Wheels 4 Less | USA | N. Amer. |
Global Robot | Nils Neumann | Gearbox | Industrial Robots | Axle | TransEuropa | Italy | Europe |
Global Robot | Nils Neumann | Gearbox | Industrial Robots | Mechanical Arm | TransEuropa | Italy | Europe |
Global Robot | Nils Neumann | Gearbox | Domestic Robots | Artificial Brain | Prometheus Labs | Luxembourg | Europe |
Global Robot | Nils Neumann | Gearbox | Domestic Robots | Artificial Brain | Frankenstein Labs | Germany | Europe |
Global Robot | Nils Neumann | Gearbox | Domestic Robots | Metal Housing | Pieza de Acero | Mexico | N. Amer. |
Global Robot | Nils Neumann | Gearbox | Domestic Robots | Backplate | Pieza de Acero | Mexico | N. Amer. |
2NF
One problem with the design at this stage is that Company Founder and Company Logo details for a given company may appear redundantly on more than one record; so may the Supplier Countries and Continents for a given supplier. These phenomena arise from the part-key dependencies of a) the Company Founder and Company Logo attributes on Company, and b) the Supplier Country and Supplier Continent attributes on Supplier. 2NF does not permit part-key dependencies. We correct the problem by splitting out the Company Founder and Company Logo details into their own table, called Companies, as well as splitting out the Supplier Country and Supplier Continent details into their own table, called Suppliers.
Company | Division | Part Type | Supplier |
---|---|---|---|
Allied Clock and Watch | Clocks | Spring | Tensile Globodynamics |
Allied Clock and Watch | Clocks | Pendulum | Tensile Globodynamics |
Allied Clock and Watch | Clocks | Spring | Pieza de Acero |
Allied Clock and Watch | Clocks | Toothed Wheel | Pieza de Acero |
Allied Clock and Watch | Watches | Quartz Crystal | Microflux |
Allied Clock and Watch | Watches | Tuning Fork | Microflux |
Allied Clock and Watch | Watches | Battery | Dakota Electrics |
Global Robot | Industrial Robots | Flywheel | Wheels 4 Less |
Global Robot | Industrial Robots | Axle | Wheels 4 Less |
Global Robot | Industrial Robots | Axle | TransEuropa |
Global Robot | Industrial Robots | Mechanical Arm | TransEuropa |
Global Robot | Domestic Robots | Artificial Brain | Prometheus Labs |
Global Robot | Domestic Robots | Artificial Brain | Frankenstein Labs |
Global Robot | Domestic Robots | Metal Housing | Pieza de Acero |
Global Robot | Domestic Robots | Backplate | Pieza de Acero |
Company | Company Founder | Company Logo |
---|---|---|
Allied Clock and Watch | Horace Washington | Sundial |
Global Robot | Nils Neumann | Gearbox |
Supplier | Supplier Country | Supplier Continent |
---|---|---|
Tensile Globodynamics | USA | N. Amer. |
Pieza de Acero | Mexico | N. Amer. |
Microflux | Belgium | Europe |
Dakota Electrics | USA | N. Amer. |
Wheels 4 Less | USA | N. Amer. |
TransEuropa | Italy | Europe |
Prometheus Labs | Luxembourg | Europe |
Frankenstein Labs | Germany | Europe |
3NF and BCNF
There is still, however, redundancy in the design. The Supplier Continent for a given Supplier Country may appear redundantly on more than one record. This phenomenon arises from the dependency of non-key attribute Supplier Continent on non-key attribute Supplier Country, and means that the design does not conform to 3NF. To achieve 3NF (and, while we are at it, BCNF), we create a separate Countries table which tells us which continent a country belongs to.
Company | Division | Part Type | Supplier |
---|---|---|---|
Allied Clock and Watch | Clocks | Spring | Tensile Globodynamics |
Allied Clock and Watch | Clocks | Pendulum | Tensile Globodynamics |
Allied Clock and Watch | Clocks | Spring | Pieza de Acero |
Allied Clock and Watch | Clocks | Toothed Wheel | Pieza de Acero |
Allied Clock and Watch | Watches | Quartz Crystal | Microflux |
Allied Clock and Watch | Watches | Tuning Fork | Microflux |
Allied Clock and Watch | Watches | Battery | Dakota Electrics |
Global Robot | Industrial Robots | Flywheel | Wheels 4 Less |
Global Robot | Industrial Robots | Axle | Wheels 4 Less |
Global Robot | Industrial Robots | Axle | TransEuropa |
Global Robot | Industrial Robots | Mechanical Arm | TransEuropa |
Global Robot | Domestic Robots | Artificial Brain | Prometheus Labs |
Global Robot | Domestic Robots | Artificial Brain | Frankenstein Labs |
Global Robot | Domestic Robots | Metal Housing | Pieza de Acero |
Global Robot | Domestic Robots | Backplate | Pieza de Acero |
Supplier | Supplier Country |
---|---|
Tensile Globodynamics | USA |
Pieza de Acero | Mexico |
Microflux | Belgium |
Dakota Electrics | USA |
Wheels 4 Less | USA |
TransEuropa | Italy |
Prometheus Labs | Luxembourg |
Frankenstein Labs | Germany |
Company | Company Founder | Company Logo |
---|---|---|
Allied Clock and Watch | Horace Washington | Sundial |
Global Robot | Nils Neumann | Gearbox |
Country | Continent |
---|---|
USA | N. Amer. |
Mexico | N. Amer. |
Belgium | Europe |
Italy | Europe |
Luxembourg | Europe |
4NF
What happens if a company has more than one founder or more than one logo? (Let us assume for the sake of the example that both of these things may happen.) One way of handling the situation would be to alter the primary key of our Companies table to {Company, Company Founder, Company Logo}. Representing multiple founders and multiple logos then becomes possible, but at the price of redundancy:
Company | Company Founder | Company Logo |
---|---|---|
Allied Clock and Watch | Horace Washington | Sundial |
Global Robot | Nils Neumann | Gearbox |
International Broom | Gareth Patterson | Whirlwind |
International Broom | Sandra Patterson | Whirlwind |
International Broom | Gareth Patterson | Sweeper |
International Broom | Sandra Patterson | Sweeper |
This type of redundancy reflects the fact that the design does not conform to 4NF. We correct the design by separating facts about founders from facts about logos.
Company | Division | Part Type | Supplier |
---|---|---|---|
Allied Clock and Watch | Clocks | Spring | Tensile Globodynamics |
Allied Clock and Watch | Clocks | Pendulum | Tensile Globodynamics |
Allied Clock and Watch | Clocks | Spring | Pieza de Acero |
Allied Clock and Watch | Clocks | Toothed Wheel | Pieza de Acero |
Allied Clock and Watch | Watches | Quartz Crystal | Microflux |
Allied Clock and Watch | Watches | Tuning Fork | Microflux |
Allied Clock and Watch | Watches | Battery | Dakota Electrics |
Global Robot | Industrial Robots | Flywheel | Wheels 4 Less |
Global Robot | Industrial Robots | Axle | Wheels 4 Less |
Global Robot | Industrial Robots | Axle | TransEuropa |
Global Robot | Industrial Robots | Mechanical Arm | TransEuropa |
Global Robot | Domestic Robots | Artificial Brain | Prometheus Labs |
Global Robot | Domestic Robots | Artificial Brain | Frankenstein Labs |
Global Robot | Domestic Robots | Metal Housing | Pieza de Acero |
Global Robot | Domestic Robots | Backplate | Pieza de Acero |
Company |
---|
Allied Clock and Watch |
Global Robot |
International Broom |
Company | Company Logo |
---|---|
Allied Clock and Watch | Sundial |
Global Robot | Gearbox |
International Broom | Whirlwind |
International Broom | Sweeper |
Company | Company Founder |
---|---|
Allied Clock and Watch | Horace Washington |
Global Robot | Nils Neumann |
International Broom | Gareth Patterson |
International Broom | Sandra Patterson |
Supplier | Supplier Country |
---|---|
Tensile Globodynamics | USA |
Pieza de Acero | Mexico |
Microflux | Belgium |
Dakota Electrics | USA |
Wheels 4 Less | USA |
TransEuropa | Italy |
Prometheus Labs | Luxembourg |
Frankenstein Labs | Germany |
Country | Continent |
---|---|
USA | N. Amer. |
Mexico | N. Amer. |
Belgium | Europe |
Italy | Europe |
Luxembourg | Europe |
5NF
We know that the Clocks division of Allied Clock and Watch relies upon its suppliers to provide springs, pendulums, and toothed wheels. We also know that the Clocks division deals with suppliers Tensile Globodynamics and Pieza de Acero. Let us suppose for the sake of the example that the following rule applies: if a supplier that a division deals with offers a part that the division needs, the division will always purchase it. If, for example, Tensile Globodynamics start producing Toothed Wheels, then Allied Clock and Watch will start purchasing them. This rule leads to redundancy in our design as it stands, causing it to fall short of 5NF. We correct the design by recording part-types-by-company-division separately from suppliers-by-company-division, and adding a further table that provides information as to which suppliers offer which parts.
Company | Division | Part Type |
---|---|---|
Allied Clock and Watch | Clocks | Spring |
Allied Clock and Watch | Clocks | Pendulum |
Allied Clock and Watch | Clocks | Toothed Wheel |
Allied Clock and Watch | Watches | Quartz Crystal |
Allied Clock and Watch | Watches | Tuning Fork |
Allied Clock and Watch | Watches | Battery |
Global Robot | Industrial Robots | Flywheel |
Global Robot | Industrial Robots | Axle |
Global Robot | Industrial Robots | Mechanical Arm |
Global Robot | Domestic Robots | Artificial Brain |
Global Robot | Domestic Robots | Metal Housing |
Global Robot | Domestic Robots | Backplate |
Company | Division | Supplier |
---|---|---|
Allied Clock and Watch | Clocks | Tensile Globodynamics |
Allied Clock and Watch | Clocks | Pieza de Acero |
Allied Clock and Watch | Watches | Microflux |
Allied Clock and Watch | Watches | Dakota Electrics |
Global Robot | Industrial Robots | Wheels 4 Less |
Global Robot | Industrial Robots | TransEuropa |
Global Robot | Domestic Robots | Prometheus Labs |
Global Robot | Domestic Robots | Frankenstein Labs |
Global Robot | Domestic Robots | Pieza de Acero |
Part Type | Supplier |
---|---|
Spring | Tensile Globodynamics |
Pendulum | Tensile Globodynamics |
Spring | Pieza de Acero |
Toothed Wheel | Pieza de Acero |
Quartz Crystal | Microflux |
Tuning Fork | Microflux |
Battery | Dakota Electrics |
Flywheel | Wheels 4 Less |
Axle | Wheels 4 Less |
Axle | TransEuropa |
Mechanical Arm | TransEuropa |
Artificial Brain | Prometheus Labs |
Artificial Brain | Frankenstein Labs |
Metal Housing | Pieza de Acero |
Backplate | Pieza de Acero |
Company | Company Logo |
---|---|
Allied Clock and Watch | Sundial |
Global Robot | Gearbox |
Company | Company Founder |
---|---|
Allied Clock and Watch | Horace Washington |
Global Robot | Nils Neumann |
International Broom | Gareth Patterson |
International Broom | Sandra Patterson |
Supplier | Supplier Country |
---|---|
Tensile Globodynamics | USA |
Pieza de Acero | Mexico |
Microflux | Belgium |
Dakota Electrics | USA |
Wheels 4 Less | USA |
TransEuropa | Italy |
Prometheus Labs | Luxembourg |
Frankenstein Labs | Germany |
Country | Continent |
---|---|
USA | N. Amer. |
Mexico | N. Amer. |
Belgium | Europe |
Italy | Europe |
Luxembourg | Europe |
Denormalization
Databases intended for Online Transaction Processing (OLTP) are typically more normalized than databases intended for On Line Analytical Processing (OLAP). OLTP Applications are characterized by a high volume of small transactions such as updating a sales record at a super market checkout counter. The expectation is that each transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are primarily "read mostly" databases. OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or "denormalized" data may facilitate Business Intelligence applications. Specifically, dimensional tables in a star schema often contain denormalized data. The denormalized or redundant data must be carefully controlled during ETL processing, and users should not be permitted to see the data until it is in a consistent state. The normalized alternative to the star schema is the snowflake schema. It has never been proven that this denormalization itself provides any increase in performance, or if the concurrent removal of data constraints is what increases the performance. The need for denormalization has waned as computers and RDBMS software have become more powerful.
Denormalization is also used to improve performance on smaller computers as in computerized cash-registers and mobile devices, since these may use the data for look-up only (e.g. price lookups). Denormalization may also be used when no RDBMS exists for a platform (such as Palm), or no changes are to be made to the data and a swift response is crucial.
Non-first normal form (NF²)
In recognition that denormalization can be deliberate and (dubiously) useful, the non-first normal form is a definition of database designs which do not conform to the first normal form, by allowing "sets and sets of sets to be attribute domains" (Schek 1982). This extension is a (non-optimal) way of implementing hierarchies in relations. Some theoreticians have dubbed this practitioner developed method, "First Ab-normal Form", Codd defined a relational database as using relations, so any table not in 1NF could not be considered to be relational.
Consider the following table:
Person | Favorite Colors |
---|---|
Bob | blue, red |
Jane | green, yellow, red |
Assume a person has several favorite colors. Obviously, favorite colors consist of a set of colors modeled by the given table.
To transform this NF² table into a 1NF an "unnest" operator is required which extends the relational algebra of the higher normal forms. The reverse operator is called "nest" which is not always the mathematical inverse of "unnest", although "unnest" is the mathematical inverse to "nest". Another constraint required is for the operators to be bijective, which is covered by the Partitioned Normal Form (PNF).
Further reading
- Litt's Tips: Normalization
- Date, C. J., & Lorentzos, N., & Darwen, H. (2002). Temporal Data & the Relational Model (1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9.
- Zimyani, E (2006), Temporal Aggregates and Temporal Universal Quantification in Standard SQL ACM SIGMOD Record, Vol 35, Number 2, June 2006.
- Date, C. J. (1999), An Introduction to Database Systems (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
- Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM, vol. 26, pp. 120-125
- Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings
- H.-J. Schek, P.Pistor Data Structures for an Integrated Data Base Management and Information Retrieval System
References
- ^ His term eliminate is misleading, as nothing is "lost" in normalization. He probably described eliminate in a mathematical sense to mean elimination of complexity.
- ^ Codd, Edgar F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM. 13 (6): 377–387.
{{cite journal}}
: Unknown parameter|month=
ignored (help) - ^ DBDebunk
- ^ Zimyani
See also
- Aspect (computer science)
- Cross-cutting concern
- Inheritance semantics
- Functional normalization
- Orthogonalization
- Refactoring
- Business rules
External links
- Database Normalization Basics by Mike Chapple (About.com)
- Database Normalization Intro, Part 2
- An Introduction to Database Normalization by Mike Hillyer.
- Normalization by ITS, University of Texas.
- Rules of Data Normalization by Data Model.org
- A tutorial on the first 3 normal forms by Fred Coulson
- Free PDF poster available by Marc Rettig
- Description of the database normalization basics by Microsoft