Is there an equivalent to the Universal Relation in the document-oriented database world? There is: the Universal Collection.
‘In relational databases, the universal relation assumption states that one can place all data attributes into a (possibly very wide) table, which may then be decomposed into smaller tables as needed.’ (from: http://en.wikipedia.org/wiki/Universal_relation_assumption).
There is a bit more behind that statement; for example, there is an assumption that the same type of data are stored in a column with the same name. E.g., if there is the concept of a ‘street name’, then all street names will be in a single column, probably named ‘street name’.
Another assumption is that if a data set does not have data for all columns of the Universal Relation (and hardly any does), then the value in the column is ‘null’. So in general the Universal Relation is very sparse and has a lot of null values in its columns.
A comprehensive discussion can be found in http://www.informatik.uni-trier.de/~ley/db/books/dbtext/ullman89.html (for further detailed exploration).
NoSQL Database Equivalent: Universal Collection
What is the equivalent of a Universal Relation in a document-oriented database? One organization scheme of document-oriented databases are collections of documents. In such a case a first approximation is that all documents are stored in the same collection, a Universal Collection.
The second step towards a Universal Collection is that all property names of the documents that contain the same type of data are named the same. This is in general possible as the naming is scoped by documents and sub-collections in documents.
In contrast to the Universal Relation, a document in a Universal Collection only has to have the properties that have values. Other properties do not have to be added with a ‘null’ value, they can simply be left out as the document model does not have a fixed schema; each document can only contain the properties it really requires. If this approach is taken, the Universal Collection is not sparse at all in comparison with the Universal Relation, but very compact.
Since the document model does not enforce a strictly typed schema it is possible that the same property is of different data types (in the sense of the JSON model). So it is possible that a property called ‘address’ can be a ‘string’ in one document and a sub-collection consisting of several strings in another document with both containing perfect address data. In contrast to the relational model, this is valid (and fine) in a document model (if at all, it causes problems during processing, but not from a data model perspective).
On this level, it is certainly possible to define the equivalent of an Universal Relation in a document model: the Universal Collection. This is interesting as later on this will serve as a starting point for normalization.
In reality I have seen projects that actually store all documents in a single collection as it made it easier to query the system in comparison of distributing or partitioning documents across several collections. The question is, of course, are the same types of data stored in properties with the same property name to ensure the semantic equivalence or not. This topic will re-appear in a later blog post about document model definition.