Relational Data in a Document-oriented NoSQL Database (Part 2): Schema

The ‘schema-less-ness’ of document oriented databases is touted as a major plus and advantage for these systems. Why is that?

Relational Table vs. Document Collection

Terminology-wise, relational tables are in the domain of relational database management systems. Tables contain data in form of rows. Document collections are in the domain of some NoSQL databases that support the document structure (e.g., MongoDB). Document collections contain data in discrete documents. Document collections are used as the basis for the following discussion.

One or Several Schemas?

Upfront, document oriented databases have some form of schema built-in. First, there is the concept of collections. Before any document can be stored, a collection must be in place. Second, data to be stored in collections must be documents complying to a document data structure, in many cases this is JSON.

These constraints means that each document is structured according to JSON. If a given document is seen as an instance of a document schema, then each document actually complies to that schema. That schema is implicit as it is not externalized and represented separately. If each document in a collection is different, then there are as many (implicit) schemas as there are documents. If all documents have the same internal structure, then all comply to the same implicit schema.

In the general case, for a given collection, there can be as many implicit schemas as there are documents. In the minimal case, there is no schema (if the collection is empty) or one schema if all documents comply to the same implicit schema. In contrast, in the relational tables, all rows always comply to the table definition.

Schema Enforcement

A relation enforces the structure of its rows. In contrast, a collections does not enforce the structure of its documents. As long as a document is in a consistent data type (e.g. JSON), it can be stored in any collection.

In a given collection it is possible that all documents have the same internal structure. In this case they would all follow the same schema. In the extreme case, each document has its own structure not shared by any other document and that then means that each document has its own schema. A collection does not enforce the schema of its documents.

If document schemas have to be enforced, it can only be done outside the document database, either by the code that writes documents to collections (database inbound) or by the code that reads documents from collections (database outbound). In the inbound case, this ensures that all documents are actually of a given structure. In the outbound case, documents that do not comply will never be processed or they will be changed on the fly in order to comply. Alternatively, the reading ocde can throw an exception if it finds a non-compliant document and then the document can be processed in order to make it compliant.

Querying and Programming Model

In a relational world database queries can assume that all rows in a table are of the same structure and of the appropriate type. The programs accessing the database and retrieving tuples can assume the same. From a programming model perspective there is no variation as the structure is fixed and therefore predictable.

Accessing a document-oriented database is different as neither the queries nor the accessing programs can assume any particular document structure (as the database systems does not enforce any structure). Assumptions can only be made if a fixed document schema or a fixed set of variations is enforced elsewhere by convention. This requires that the programming model is extremely aware of the potential variety and has to understand how to deal with this variety.

Discussion

With all these dynamic possibilities, it is therefore no problem to store relational data into a document-oriented database management system. A document-oriented database management system can deal with structured data, even though it cannot enforce the structure. Even in the case that relations change over time can be handled by a document-oriented database system as it allows documents changing their shape over time.

Coming back to the initial question: The ‘schema-less-ness’ of document oriented databases is touted as a major plus and advantage for these systems. Why is that? The answer might lie in the ease that allows to store documents with different implicit schemas in the same collection.

Big caveat: Storing documents of different schemas is ‘easy’. However, that pushes the complexity of dealing with documents of different implicit schemas to the programming model: it has to be able to deal with the variation. And for the most part, I believe this is uncharted territory as there are no programming language constructs that allow to characterize variation during processing.

Advertisements

One thought on “Relational Data in a Document-oriented NoSQL Database (Part 2): Schema

  1. Pingback: Relational Data in a Document-oriented NoSQL Database: Overview | r e a l – p r o g r a m m e r

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s