
When designing tables an important concept is the Functional Determination of columns in each table. It is this which is the basis for the definitions of Normal Forms. The process of Normalization involves a number of Normal Forms, and you'll find as simple as possible (but no more so) descriptions of these in Normalization.zip in my Dropbox public databases folder at:
Concentrate on the first three normal forms to start with. Normalization to Third Normal Form (3NF) requires all non-key columns to be functionally determined solely by the whole of the table's primary key.
To illustrate this let's take a look at the physical model for my Inventory demo in the same Dropbox folder:
Firstly, consider the related tables Customers, Cities CountiesOrRegions, and Countries. You'll see that Customers contains a foreign key CityID numeric column (not the city name, as city names can legitimately be duplicated, so cannot be used as keys), which references the primary key of the Cities table. The CityID column in Customers fits the definition of Third Normal Form as it is determined by the primary key CustomerID, i.e. once we know the CustomerID value we know which city the customer is in.
Let's consider what would be the case if we were to include a CountyOrRegionID column in the Customers table. The table would not now be normalized to Third Normal Form because, once we know which city a customer is in we automatically know which County or Region they are in, i.e. CountyOrRegionID is functionally determined by CityID. The table is therefore not normalized to Third Normal Form because CountyOrRegionID is not solely determined by the table's key, CustomerID. In the jargon it is transitively dependant on the key. This could result in an update anomaly if the customer moved to another city, but we leave the CountyOrRegionID value in the row unchanged. An example would be my move from Cannock in Staffordshire to Newport in Shropshire a couple of years ago. If the table were not normalized it would be possible for it to incorrectly show me living in Newport in Staffordshire.
Let's now look at the OrderDetails and Products tables in the model. This table models the many-to-many relationship type between Orders and Products by resolving it into two one-to-many relationship types. The Products table contains a UnitPrice column which is functionally determined solely by the ProductID primary key column, so the table is normalized to Third Normal Form. However, ProductDetails also has a UnitPrice column. At first sight it looks like the table is not normalized to Third Normal Form because UnitPrice is dependant on only one half of the primary key (a composite of two columns in this case). Surely we can dispose of the UnitPrice column in OrderDetails therefore? But what would then happen if the UnitPrice value in Products is changed? All earlier invoices would now show the new price, which means they would be incorrect. By having a UnitPrice column in OrderDetails, on the other hand, whenever we insert a new row into this table we can assign the current unit price value from Products to the column, and the value will remain the same whatever changes are subsequently made to the value in Products. The auditors will now be happy! Both tables are now in Third Normal Form, because in products UnitPrice is solely determined by ProductID, and in OrderDetails it is determined by the combination of OrderID and ProductID, i.e. by the whole of the primary key.
I've concentrated on Third Normal Form in this example because experience has shown that if a table is normalized to Third Normal Form, it will in the vast majority of cases also be normalized to the higher normal forms. This is not to say that the higher normal forms are unimportant, and once you have a good grasp of normalization to Third Normal Form (3NF), you should familiarise yourself with the higher normal forms described in my demo.