SQL for JSON and Schema Support (Part 3): Intermezzo 1 – MongoDB’s $jsonschema

MongoDB introduced support for JSON Schema through $jsonschema. Let’s explore this new functionality a bit in this blog.

$jsonschema

The functionality is introduced here: https://docs.mongodb.com/master/reference/operator/query/jsonSchema/#op._S_jsonSchema It states “$jsonSchema can be used in a document validator, which enforces that inserted or updated documents are valid against the schema.”

A first item to note is that this approach is supporting BSON types (http://bsonspec.org/, https://docs.mongodb.com/master/reference/operator/query/type/), not just JSON structures (https://www.json.org/), using a specific property “bsonType” that is not part of the JSON Schema standard (http://json-schema.org/).

A second observation is that the schema specification is inline with the collection creation and cannot refer to a separate JSON schema file or JSON object representing a JSON schema.

JSON Schema Validator Example

Let’s use the example of the first blog in this series, create a schema for it and use that as a constraint for the “orders” collection. Then documents are added to the collection (and there seem to be errors as well). For reference the version used is: MongoDB server version: 3.6.0.

> mongo
> use schema_exploration
> db.createCollection("orders", {
  "validator": {
   "$jsonSchema": {
    "bsonType": "object",
    "required": ["orderId", "orderDate", "orderLineItems"],
    "properties": {
     "orderId": {
      "bsonType": "int",
      "description": "Order Identifier: must be of 
                     type int and is required"
     },
     "orderDate": {
      "bsonType": "date",
      "description": "Order Date: must be of 
                     type date and is required"
     },
     "orderLineItems": {
      "bsonType": "array",
      "items": {
       "bsonType": "string"
      },
      "description": "Order Line Items: must be of 
                     type array and is required"
     }
    }
   }
  }
 })
{ "ok" : 1 }

A quick note: “bsonType” can be used in all levels in order to refer to BSON types, not just on the top level.

> db.orders.insert({
  "orderId": NumberInt(1),
  "orderDate": new Date("2017-09-30"),
  "orderLineItems": [{
   "itemId": 55,
   "numberOrdered": 30
  }, {
   "itemId": 56,
   "numberOrdered": 31
  }]
 })
WriteResult({
 "nInserted": 0,
 "writeError": {
  "code": 121,
  "errmsg": "Document failed validation"
 }
})

Along the way I ran into a validation issue as I constraint the array elements to strings, rather than objects, as used in the example of the first blog in this series. So I made a schema definition mistake.

To note is that the response on the shell does not indicate what the problem was making debugging hard, especially when large and complex schemas are to be debugged.

> db.orders.insert({
  "orderId": NumberInt(1),
  "orderDate": new Date("2017-09-30"),
  "orderLineItems": ["a", "b"]
 })
WriteResult({
 "nInserted": 1
})

Once I realized the mistake I made, I inserted a document complying to the schema in order to make sure I identified the issue correctly.

JSON Schema Validator Update

Obviously, after defining a wrong schema, the correct schema should be used as validator.

This is the correct schema:

{
 "bsonType": "object",
 "required": ["orderId", "orderDate", "orderLineItems"],
 "properties": {
  "orderId": {
   "bsonType": "int",
   "description": "Order Identifier: must be of 
                  type int and is required"
  },
  "orderDate": {
   "bsonType": "date",
   "description": "Order Date: must be of 
                  type date and is required"
  },
  "orderLineItems": {
   "bsonType": "array",
   "items": {
    "bsonType": "object",
    "properties": {
     "itemId": {
      "bsonType": "int"
     },
     "numberOrdered": {
      "bsonType": "int"
     }
    }
   },
   "description": "Order Line Items: must be of 
                  type array and is required"
   }
  }
 }

And this is the command to update the validator:

> db.runCommand({
  "collMod": "orders",
  "validator": {
   "$jsonSchema": {
    "bsonType": "object",
    "required": ["orderId", "orderDate", "orderLineItems"],
    "properties": {
     "orderId": {
      "bsonType": "int",
      "description": "Order Identifier: must be of 
                     type int and is required"
     },
     "orderDate": {
      "bsonType": "date",
      "description": "Order Date: must be of 
                     type date and is required"
     },
     "orderLineItems": {
      "bsonType": "array",
      "items": {
       "bsonType": "object",
       "properties": {
        "itemId": {
         "bsonType": "int"
        },
        "numberOrdered": {
         "bsonType": "int"
        }
       }
      },
      "description": "Order Line Items: must be of 
                     type array and is required"
     }
    }
   }
  },
  "validationLevel": "strict"
 })
{ "ok" : 1 }

Some background on the command used is here: https://docs.mongodb.com/master/reference/command/collMod/.

Following is an attempt to add one more of the (now mismatching) documents:

> db.orders.insert({
  "orderId": NumberInt(1),
  "orderDate": new Date("2017-09-30"),
  "orderLineItems": ["a", "b"]
 })
WriteResult({
 "nInserted": 0,
 "writeError": {
  "code": 121,
  "errmsg": "Document failed validation"
 }
})

As it should be, the insert fails.

And here the insert of a now correct document:

> db.orders.insert({
  "orderId": NumberInt(1),
  "orderDate": new Date("2017-09-30"),
  "orderLineItems": [{
   "itemId": NumberInt(55),
   "numberOrdered": NumberInt(20)
  }, {
   "itemId": NumberInt(56),
   "numberOrdered": NumberInt(21)
  }]
 });
WriteResult({
 "nInserted": 1
})

Collection Inconsistency: Mismatch of Schema and Documents

There is an interesting issue appearing at this point. The new schema does not match all existing documents in the collection. Or the other way around: the collection now contains documents that do not match that schema.

> db.orders.find()
{
 "_id": ObjectId("5a2022c3fb460d15db9ec73e"),
 "orderId": 1,
 "orderDate": ISODate("2017-09-30T00:00:00Z"),
 "orderLineItems": ["a", "b"]
} {
 "_id": ObjectId("5a202322fb460d15db9ec741"),
 "orderId": 1,
 "orderDate": ISODate("2017-09-30T00:00:00Z"),
 "orderLineItems": [{
  "itemId": 55,
  "numberOrdered": 20
 }, {
  "itemId": 56,
  "numberOrdered": 21
 }]
}

MongoDB did not flag that there are documents in the collection that will not match the new schema (even though the validation level strict was used).

Adding a validation action with value of “error” does not change the situation, either.

Implication to Semantics

Given that the schema of a collection can be changed at any time, and given that MongoDB does not fail the schema update based on mismatching documents already in the collection, examining the schema is insufficient to understand the structure of the documents in a collection.

So a collection with a schema does not ensure that all documents in that collection are schema compliant. It rather insures that from the point in time the schema was added or updated documents will have to comply. Previous documents in the collections are not affected.

Summary

The notion of “schema” in context of MongoDB is very different from the notion of “schema” in context of relational database management systems. In MongoDB the documents in the collection do not have to comply to the schema; they only do have to comply at time of insertion.

There will be more exploration coming up in the next blog on this topic in order to further understand the semantics of “schema” in context of MongoDB.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON and Schema Support (Part 2): Where does the “Interesting” Code go?

The previous blog found that the “generic” indirect representation of JSON data is one way of supporting “schema-free” JSON objects or documents. Where does the “interesting” functional code live?

Indirect Representation

To recap, the indirect representation is a set of classes, functions, etc. (depending on programming language) that can manage JSON objects or JSON documents. All or most languages have libraries supporting JSON manipulation. For example, Jackson is such a library for Java.

These JSON libraries can manage any valid JSON structure, and they do not require a schema or the JSON objects being homogeneous. Two JSON objects representing the same concept like an order with different attributes (as shown in the previous blog) can be managed by such JSON libraries.

Structural Manipulation

Structural manipulation of JSON objects supports the addition, update or deletion of properties (members) as well as JSON array elements. Property values can be replaced, for example, a JSON string with a JSON object.

Through structural manipulation it is possible to change a JSON object as needed, when e.g. new details appear in form of additional properties.

Structural manipulation was demonstrated in a database context in the last blog: properties were added through the update statement. The same is possible in the indirect representation libraries in the various programming languages.

Computation

Structural manipulation is not the only code that is required as structural manipulation does not allow to compute any specific application semantics. For example, in context of orders, the total value of not yet shipped orders might be a value that needs to be computed.

In a database context this would be an aggregation query that sums up the amount of all orders that do not have the status of shipped.

In context of a programming language it would require a function that iterates through all orders and, like in the database aggregation approach, adds up the sum of those orders that have not shipped yet.

It probably would be implemented as a set of cooperating functions, like

DollarAmount getValueOfOrdersNotShipped(JSONArray orders)
boolean hasOrderShipped(JSONObject order)
DollarAmount getValueOfOrder(JSONObject order)

JSONArray as well as JSONObject are an example of an indirect representation holding order data as a JSON structure.

Note: of course, in the absence of a schema (which is assumed here), there is no assurance that the JSONArray or the JSONObject contain only orders or that the orders are homogeneous in structure. There has to be “trust” that this is indeed the case.

If validation is desired, and if no schema is available, then the only alternative is validating values in one or more JSON object properties. For example, order identifiers might be of a specific structure that uniquely identifies an identifier being an order identifier. This would require trust that the algorithms creating identifiers are correct.

Separation of Manipulation and Computation

The JSON libraries supporting the indirect representation are separate from the functional code (like the summing up of order values). The software architecture and design has to structure this separation and ideally ensures that all functions concerned with orders are “close” from a code structure or software architecture perspective.

There might be functions that can be reused across different concepts (like orders, returns, shipments, etc.), and they can be refactored out, of course, as in “normal” functional code.

Given the above rationalization, how does the absence of a schema come into the picture?

Implication of Schema Free JSON Objects

Since there is no schema, JSON objects can have a different structure even though they represent the same concepts. In context of orders,  let’s look at two use cases:

  • An order does not have a shipping status
  • An order does have a value but in a variety of data types

In a world without schema these are possible use cases and the functional code needs to check for those.

Addressing the first use case can be accomplished by checking for existence. Code can check if a property is present and react accordingly. In the above example, the code designer can choose to have hasOrderShipped() return false or throw an error in case there is no shipping status.

The second use case can be addressed by checking for the type of the value of the order. If possible, value transformations can be implemented in getValueOfOrder(), e.g., string to number; if it is not possible to transform, an error can be thrown.

Summary

In a schema free JSON context there are several aspects from a code perspective: functional code implementing application semantics is separate from the code that manages the structure of JSON objects. That separation must be carefully managed from an architectural perspective.

The functional code must anticipate non-homogeneous JSON objects and check for variation in order to be able to implement the functionality accurately.

But wait, there is more:-) The next blog will venture into more nuances.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.