Database Normalization Explained: 1NF, 2NF, 3NF
Database normalization is a systematic approach to organizing data within a relational database. The primary goals are to reduce data redundancy (storing the same data in multiple places) and to improve data integrity. Normalization involves dividing large tables into smaller, less redundant tables and defining relationships between them.
The process is guided by a series of rules known as Normal Forms (NF). While higher normal forms exist, achieving the Third Normal Form (3NF) is the standard for most transactional databases (OLTP).
First Normal Form (1NF)
To satisfy 1NF, a table must meet the following criteria:
- Atomic Values: Each column must hold indivisible (atomic) values. No arrays, lists, or comma-separated values in a single cell.
- Unique Column Names: Each column in a table must have a unique name.
- No Repeating Groups: You should not have multiple columns representing the same attribute (e.g., Phone1, Phone2, Phone3).
Fixing 1NF Violations: If a user has multiple phone numbers, create a separate 'user_phones' table with a foreign key linking back to the 'users' table, rather than storing them in a single column or multiple numbered columns.
Second Normal Form (2NF)
To achieve 2NF, a table must:
- Be in 1NF.
- Have no Partial Dependency. This means all non-key attributes must depend on the entire primary key.
Context: This only applies if a table has a composite primary key (a primary key made of two or more columns). If a column depends on only one part of the composite key, it violates 2NF.
Fixing 2NF Violations: Move the partially dependent columns into a new table where the part of the composite key they depend on becomes the sole primary key.
Third Normal Form (3NF)
To achieve 3NF, a table must:
- Be in 2NF.
- Have no Transitive Dependency. This means a non-key column must not depend on another non-key column. All non-key columns must depend only on the primary key.
Example: Consider a 'orders' table containing 'customer_id', 'customer_name', and 'customer_address'. 'customer_name' and 'customer_address' depend on 'customer_id', not the 'order_id' (the primary key).
Fixing 3NF Violations: Extract the transitively dependent columns into a new table (e.g., a 'customers' table) and leave the foreign key ('customer_id') in the original table.
Denormalization
While normalization is crucial for data integrity and reducing update anomalies, strict adherence can sometimes lead to complex queries requiring numerous JOINs, which can degrade read performance. In analytical databases (OLAP) or high-read systems, engineers may intentionally denormalize data, accepting some redundancy in exchange for faster read times. Understanding normalization is a prerequisite to knowing when and how to safely denormalize.