Tags:concept database normalization functionaldependencies Status:🟩

Normalization

Summary

Normalization organizes data in a relational database to reduce redundancy and improve data integrity. It minimizes anomalies during data operations and enhances efficiency. The process involves evaluating relations against four normal forms and decomposing those that do not meet the criteria.

Details

Normalization is the process of organizing data in a relational database to reduce redundancy and improve data integrity. By structuring data into related tables, it minimizes anomalies during data operations and enhances the efficiency and maintainability of the database.

Goals of normalization The primary goal of normalization is to ensure that no anomalies occur during data insertion, deletion, or update. This is achieved by analyzing the given relations based on their functional dependencies and candidate keys. As a result, the risk of inconsistent updates is reduced, and storage space is utilized efficiently.

Levels of understanding At the logical level, normalization helps users easily understand the meaning of data and formulate correct queries. At the implementation level, it ensures that the database operates effectively without redundancy.

Normal forms and decomposition There are four main normal forms in normalization, with higher normal forms having stricter constraints. The normalization procedure involves evaluating relations against these normal form tests sequentially. Relations that do not meet the requirements are decomposed into smaller, more manageable relations.

Informal Normalization Guidelines

These are some guidelines to go by when normalizing tables.

The name of the relation should be meaningful.
Attribute types from multiple entity types should not be combined in a single relation.
Avoid excessive amounts of NULL values in a relation. Maybe most records don’t have a value for a specific attribute type, hence NULL will be used. Instead split it up into relationship table, so the records that holds a value for the specific attribute type can get associated.

1NF

Rules:

All attributes are of atomic types. No lists or sets as attribute type. There could be 2 ways to solve this:

Separate rows: Create a new record with the same user and different email. However this would create 2 records with the same id which might be a problem if its a unique key.
New table: Decompose the email into a new table that contains id and mail. Then its possible to associate the id with name, age etc. in one table and the id with email in another table.

2NF

Rules:

Is on 1NF
Non-key attributes fully (not partial) depend on the whole candidate key(s).


The keys are `manufacturer` and `model`. Manufacturer $\to$ country.
We have a non key attribute `country`. It depends on the manufacturer but that is just a part of the key, so it depends only partially on the key.

3NF

Rules:

Is on 2NF
No non-prime attribute type of R is transitively dependent on the primary key.


A position determines the salary. Hence its redundant to write the salary in the same table as the employees. If the salary changes for a position it has to be changed in every attribute who has that position.

BCNF

Rules:

Is on 3NF
Every determinant must be a candidate key

Old	New	New

course_id → room (A course determines the room it’s taught in).
instructor → room (An instructor determines the room they teach in). In this case, even though both course_id and instructor determine room, neither is a candidate key—since the combination of course_id and instructor uniquely identifies rows. This violates BCNF because we have a determinant (instructor) that isn’t a candidate key.

To fix this we need to break the table into two tables that eliminate the problem. Now, each determinant (both course_id and instructor) is a candidate key in its respective table, ensuring the tables are in BCNF.

Guide to normalize

Identify ‘bad’ functional dependencies that break BCNF/3NF.
If there are such, then decompose tables into two tables.
Check again and repeat for sub-tables.
We don’t decompose from 3NF to BCNF.

Practical decomposition

Consider relation R and (important) functional dependency $X \to Y$ that violates 3NF/BCNF.
- Decompose $R$ into $R_{1}$ and $R_{2}$ where
- $R_{1} = R - Y$ (everything but Y = the right side)
- $R_{2} = X Y$ (the whole FD = both left and right side)
This has the following properties
- $R_{2}$ is (normally) in BCNF
- Joining $R_{1}$ and $R_{2}$ (with = on all X attributes) yields R
Example: Person(ID, Name, ZIP, City) and ZIP $\to$ City
- X = ZIP, Y = City
- $r_{1}$ = R-Y = Person(ID, Name, ZIP)
- $r_{2}$ = XY = ZIP(ZIP, City)

Normalization dependency diagram

This diagram can be used to see which normal form the table is on. There are attributes ABCDE. A and B are the keys. The lowest NF that applies is the one that counts.