The notion of a ‘schema-free database’ keeps coming up, most recently in a meetup I attended a few days ago. Some rationalization follows divided up into the categories of ‘document’ and ‘database’.
While a generalization is easily possible, the context here will be JSON and MongoDB as these are two practical implementations that are available and often used as examples of a ‘schema-free database’. Those provide a nice constraint technology set as an example, while the principles apply to a whole range of other technology, of course.
A JSON document, in short: document, follows a set of construction principles outlined here: http://www.json.org/. This is a rather informal grammar that defines how a valid JSON document is constructed. There are no data type generators and so new data types cannot be introduced; therefore, every document is constructed from the fixed set of types enumerated on that web page.
An attempt has been made to create a more formal mechanism to define a schema for JSON documents: http://json-schema.org/. This approach provides a formal language to describe the schema of a JSON document explicitly.
The json-schema approach combined with the fixed set of types available to create a JSON document means that every JSON document can be described explicitly using json-schema without exceptions. This in turns means that every JSON document has at least an implicit schema, unless it is additionally made explicit with e.g. json-schema.
Therefore, JSON documents have a schema, an implicit one and optionally an explicit one. Depending on the particular schema definition language approach itself, a document might match more than one schema, but that is left for a separate discussion.
Set of Document Schemas
Given a set of JSON documents it is now possible to characterize their relationship to schemas. In the ‘best’ case, all documents follow the same schema; in the ‘worst’ case, each document follows its own schema. And there are cases in-between where a subset of the documents validates against a schema, and another subset against another schema. Depending on the design, one JSON document might validate against different schemas.
The relationship between documents and schemas is n:m in general.
In context of a database, there are a few interesting questions in this context:
- Does the database understand a document representation (e.g. JSON)?
- Does the database enforce a document representation?
- Does the database enforce a schema?
- Does the database understand a schema?
Let’s answer these questions for MongoDB specifically:
- MongoDB understands several document representations, JSON and BSON (http://bsonspec.org/).
- MongoDB enforces a partial schema. Each document must have a property called “_id”; if the document being inserted does not have such a property, one is automatically added.
- MongoDB does not understand an explicit schema as it does not provide for a mechanism to load a schema definition language.
MongoDB, however, understands implicit schemas as MongoDB does allow to e.g. create an index on any property of documents. So MongoDB recognizes properties.
Furthermore, MongoDB supports aggregation functions and supports e.g. the sum of properties across documents (https://realprogrammer.wordpress.com/2012/11/04/null-undefined-nan-and-missing-property-goto-considered-harmful-part-2/). So it is data type aware and implements operators (e.g. sum) on those.
Conclusion: The Notion of ‘Schema’ is Changing
This rather brief discussion clearly rationalizes that the label ‘schema-free database’ is not applicable to technologies such as those discussed in this blog (JSON/BSON, MongoDB).
Hence these technologies are not an example of ‘schema-free database’, to the contrary: they demonstrate that the notion of ‘schema’ can have a wider and more flexible interpretation then what relational databases bring forward.