Relational Databases

Last updated on 2025-05-13 | Edit this page

Estimated time: 13 minutes

Overview

Questions

  • What is a relational database?
  • How do keys work?
  • What is a normal form?

Objectives

  • understand the differences of normal forms
  • normalize data in tables

In order to approach the topic as a whole, we first want to look at and understand the individual components in detail. As already mentioned, so-called keys are used to link tables in relational databases and thus make data storage and analysis more efficient. What exactly this means is explained below.

Tables (Relations)


Tables are a fundamental tool for organizing and processing data in a structured way. They consist of rows and columns, much like what you’re used to seeing in spreadsheet programs like Excel. Each column represents a specific attribute or characteristic—such as “Name,” “Date of Birth,” or “Email Address.” Each row represents a single data entry, showing the actual values for those attributes. More precisely, each column would represent a person and contain information about that person. This concept is transferable to all types of objects.

Name Date of Birth
Thomas Eakins 25.7.1844
Bill Traylor 1.4.1854

Tables provide a clear and easy-to-understand structure that works well for both humans and computers. They allow large amounts of information to be stored in an organized way, making it easy to search, sort, and analyze the data.

The widespread success of tables comes from their simplicity, clarity, and versatility. They can be adapted to many different types of data and serve as an ideal starting point for everything from simple lists to complex data analysis.

Primary Key and Foreign Key


To structure and tidy up tables a little better, so-called keys are used in relational data modelling. What exactly this is used for will be explained in the next step. First of all, we want to understand what this concept actually means, as it is essential for the entire modelling process. These keys are used to create relationships between tables in the database and guarantee that each entry in a table can be uniquely referenced. If we think back to the tables for a moment, each column represents a data record, in our example a person. If we want to make this person uniquely referenceable, we can assign an ID. If this ID is unique and only exists once in the entire data record, it is referred to as the primary key.

Name Date of Birth ID (Primary Key)
Thomas Eakins 25.7.1844 1
Bill Traylor 1.4.1854 2

If we now want to collect more information about that some person, that we do not want to store in the same table, we can create a new table and refer to the person with the Primary Key. If we use this key in the new table we talk about a foreign key. This key is therefore used to create a relationship between the two tables, or between two entries in the tables. This allows information to be split into different tables in order to structure them better and more clearly without losing information. The advantages of this and how it is used in relational data modelling need to be clarified next.

Titel Artist (Foreign Key)
The Artists Wife and His Setter Dof 1
Blue Construction, Organge Figures 2

Key Points

  • The primary Key must never be empty
  • The primary Key must be unique
  • A foreign key must link to an existing Primary Key

Normal Forms


As the amount of information in a table grows, it’s important to keep the data well-structured and manageable. This is where the concept of normal forms comes in - a systematic approach to organizing data so that it remains clear, consistent, and easy to extend.

Normal forms build upon concepts you already know: tables (relations), primary keys, and foreign keys. They help avoid common issues in database design, such as:

  • Data redundancy: storing the same information multiple times
  • Anomalies: unwanted side effects when inserting, updating, or deleting data
  • Lack of clarity: tables that are hard to maintain or expand

Imagine you want to add new features in the future — new fields, relationships, or more advanced queries. If your tables are well-structured, you can do that with minimal effort. In a simple “all in one table” it can be very difficult to apply changes. Normal forms help create a clean and flexible foundation for your data.

Another key advantage: normalized data makes it easier to work with relationships between tables. The result is a model that is both logically sound and technically efficient.

Key Points

  • Avoid duplication of data
  • Clear separation of responsibilities (each table has a specific role)
  • Fewer errors and inconsistencies
  • Easier to maintain and extend the database

First normal form

To introduce the concept of the First Normal Form, we want to start by exploring the public collection of the Metropolitan Museum of Art (The MET) in New York. Our goal is to create a curated dataset of artworks that include dogs, forming the foundation of a small, themed collection.

We began by extracting a selection of entries that were tagged with “dog”. Here is a part of the raw table we compiled:

Title Artist Accession Number On View Location
The Artist’s Wife and His Setter Dog Thomas Eakins (American, Philadelphia, Pennsylvania 1844–1916) 23.139 The American Wing, 764
The Champion Single Sculls (Max Schmitt in a Single Scull) Thomas Eakins (American, Philadelphia, Pennsylvania 1844–1916) 34.92 The American Wing, 763
A Gorge in the Mountains (Kauterskill Clove) Sanford Robinson Gifford (Greenfield, NY 1823–1880 NY) 15.30.62 The American Wing, 761
Marie Emilie Coignet de Courson (1716–1806) with a Dog Jean Honoré Fragonard (French, Grasse 1732–1806 Paris) 37.118 European Paintings, 631
Blue Construction, Orange Figures Bill Traylor 2015.756 The Met Fifth Avenue in Gallery 773
Christ Carrying the Cross, with the Crucifixion Gerard David (Netherlandish, Oudewater ca. 1455–1523 Bruges) 1975.1.119A-B Robert Lehman Collection, 953

At first glance, this table seems to contain the basic information we need: title, artist, museum reference number, and viewing location. But let’s take a closer look. What happens if we want to search for all works by the same artist? How do we deal with artworks that appear to be listed more than once? What exactly is stored in the “On View Location” column — is it a department, a gallery number, or both? And what if we wanted to sort or filter artworks by gallery alone?

These questions reveal some issues with the structure of the table. The artist information includes names along with dates and places of birth and death, all mixed in a single field. Some entries are repeated, and the viewing location combines multiple details in a single cell. In this form, the data may be good enough for reading — but it’s far from ideal if we want to work with it programmatically, search through it efficiently, or extend it later.

This is where the concept of First Normal Form (1NF) comes into play. It provides a first step toward cleaning and structuring the data. The core idea is simple: each field in a table should hold one single, atomic value. There should be no lists, no combined information, no embedded structures. Each piece of data belongs in its own place.

To bring our MET dog collection into First Normal Form, we first need to separate combined data into individual fields. That means splitting up the location into department and gallery number, simplifying the artist field to only include the name, and making sure there are no duplicate records. After applying these changes, our structured table looks like this:

Title Artist Name Artist Nationality Artist Birthplace Artist Birth–Death Years Accession Number Department Room Number
The Artist’s Wife and His Setter Dog Thomas Eakins American Philadelphia, Pennsylvania 1844–1916 23.139 The American Wing 764
The Champion Single Sculls (Max Schmitt in a Single Scull) Thomas Eakins American Philadelphia, Pennsylvania 1844–1916 34.92 The American Wing 763
A Gorge in the Mountains (Kauterskill Clove) Sanford Robinson Gifford American Greenfield, New York 1823–1880 15.30.62 The American Wing 761
Marie Emilie Coignet de Courson (1716–1806) with a Dog Jean Honoré Fragonard French Grasse 1732–1806 37.118 European Paintings 631
Blue Construction, Orange Figures Bill Traylor American Benton, Alabama 1853/54–1949 2015.756 The Met Fifth Avenue / The American Wing 773
Christ Carrying the Cross, with the Crucifixion Gerard David Netherlandish Oudewater ca. 1455–1523 1975.1.119A-B Robert Lehman Collection 953

This version is now in First Normal Form. Each field contains just one value, and combined information has been clearly separated. This not only makes the table easier to understand, but also opens the door to further normalization in the next steps — where we’ll learn how to reduce redundancy and connect related data across multiple tables.

Second normal form

Third normal form