Why a blog on Relational Database Management System (RDBMS)? Isn’t there enough said about this type of database? Yes, there is. This post is to call out a few aspects of the relational model that I want to refer to in context of RDBMSs for subsequent discussions. In addition, it re-establishes terminology.
Normalization
Briefly, normalization attempts to remove data duplication and data dependencies so that an update of a data item has to be done only once for the whole data set to be consistent again (http://en.wikipedia.org/wiki/Database_normalization).
Universal Relation
The Universal Relation is mostly an assumption that states: it is possible to store all data in one big, wide table. For a discussion, see Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume II. Computer Science Press 1989, ISBN 0-7167-8162-X.
In a way, the universal relation is the ‘opposite’ of a perfectly normalized relational schema. It also assumes that the semantically same attributes are named the same way in different relations. As a consequence, many columns might have null values. But for the thought experiment, this is perfectly valid.
Part-Of Relationship
The part-of relationship consists between an owning object and an owned one whereby the owned object’s life cycle is tightly bound to the owning one: if the owning object is deleted, the owned one will be deleted, too. The owned object exists in context of the owning object and does not have a life on its own.
In the relational model there is a choice how this can be modeled. In principle, the owned objects can be in their own relation, separate from the owning objects in their own relation. In addition, there is a foreign key relationship from the owning to the owned relationship.
Alternatively, the owned objects can be in the same relation as the owning objects. Both would be in the same row. This is basically the case when the owned object is a scalar domain value. In this case it is almost automatically in the same relation as the owning data.
De-Normalization
De-normalization is about combining data that would otherwise be in separate relations when being fully normalized (http://en.wikipedia.org/wiki/Denormalization). This introduces redundancy and dependencies in the model for the sake of access improvement and access speed.
In a sense, the universal relation is the ultimate form of de-normalization.
Schema
In the relational database management system world the data are stored as rows in tables. A table provides a fixed structure whereby every column defines one and only one data type. Any row stored in a table, therefore, must comply to the types of the columns. The only exception is ‘null’ since ‘null’ can be of any data type. The significance is that tables are well-defined in terms of data types and rows have to comply to the table definition. This makes it a strictly typed system: the set of all table definitions constitute the schema.
Schema changes are possible in most RDBMS implementations. It is possible to add, to change or to delete columns. One consequence is that all rows in the table have to change so that their values comply to the new column definitions. Adding and removing tables is possible, too. The ‘hard’ part about changing a table is that all rows have to be changed accordingly; it is not possible to only have the change apply to future rows or when rows are updated.
Summary
This blog recalled a few important concepts from the relational model and relational database management systems. These will be referred to and used in future blogs.