SQL for JSON and Schema Support (Part 5): Intermezzo 3 – MongoDB’s $jsonschema

The previous blog discussed MongoDB’s $jsonschema behavior with a strict validation level. Let’s look at the moderate validation level in this blog.

Example

As usual, first, let’s create a collection and add a few JSON documents to it. Afterwards a schema validation is added with the moderate setting (the following is based on MongoDB version 3.6.1).

> mongo
> use moderate_exploration

Initially, before adding a schema, two JSON objects are inserted that are not compliant with the schema that is going to be added afterwards. The reason is that we need non-compliant JSON objects to discuss the moderate level later.

> db.orders.insert({
   "orderId": 1,
   "orderDate": ISODate("2017-09-30T00:00:00Z"),
   "orderLineItems": [{
    "itemId": 55,
    "numberOrdered": 20
    }, {
    "itemId": 56,
    "numberOrdered": 21
   }],
   "specialInstructions": "Drop of in front, 
                           not back of location"
  })
WriteResult({ "nInserted" : 1 })
> db.orders.insert({
   "orderId": 2,
   "orderDate": ISODate("2017-09-30T00:00:00Z"),
   "orderLineItems": [{
    "itemId": 55,
    "numberOrdered": 40
    }, {
    "itemId": 56,
    "numberOrdered": 41
   }],
   "preferredColor": "red"
  })
WriteResult({ "nInserted" : 1 })

Now the schema is added:

> db.runCommand({ 
   "collMod": "orders",
   "validator": {  
    "$jsonSchema": {   
      "bsonType": "object",
       "required": ["orderId", "orderDate", "orderLineItems"],
       "properties": {
        "orderId": { 
         "bsonType": "int",
         "description": "Order Identifier: must be of 
                         type int and is required"
        },
        "orderDate": { 
         "bsonType": "date",
         "description": "Order Date: must be of 
                         type date and is required"
        },
        "orderLineItems": { 
         "bsonType": "array",
         "items": {  
          "bsonType": "object",
          "properties": {   
           "itemId": {    
           "bsonType": "int"   
           },
           "numberOrdered": {    
           "bsonType": "int"   
           }  
          } 
         },
         "description": "Order Line Items: must be of 
                         type array and is required"
      }   
     }  
    } 
   },
   "validationLevel": "moderate",
   "validationAction": "error"
  })
{ "ok" : 1 }

After the schema is added, two more JSON objects are inserted, this time being schema compliant.

> db.orders.insert({
   "orderId": NumberInt(3),
   "orderDate": ISODate("2017-09-30T00:00:00Z"),
   "orderLineItems": [{
    "itemId": NumberInt(55),
    "numberOrdered": NumberInt(60)
    }, {
    "itemId": NumberInt(56),
    "numberOrdered": NumberInt(61)
   }]
  })
WriteResult({ "nInserted" : 1 })
> db.orders.insert({
   "orderId": NumberInt(4),
   "orderDate": ISODate("2017-09-30T00:00:00Z"),
   "orderLineItems": [{
    "itemId": NumberInt(55),
    "numberOrdered": NumberInt(80)
    }, {
    "itemId": NumberInt(56),
    "numberOrdered": NumberInt(81)
   }]
  })
WriteResult({ "nInserted" : 1 })

At this point the created collection is governed by a schema, and contains four JSON documents, two are compliant with the schema (orderId 3 and 4), and two are not compliant (orderId 1 and 2).

Analysis

The MongoDB documentation states for “moderate”: “Apply validation rules to inserts and to updates on existing valid documents. Do not apply rules to updates on existing invalid documents.” (https://docs.mongodb.com/manual/reference/command/collMod/#validationLevel).

Let’s explore now the behavior of the moderate validation level.

First, let’s try to insert a non-compliant JSON document. The insert will fail as expected:

> db.orders.insert({
   "orderId": 5,
   "orderDate": ISODate("2017-09-30T00:00:00Z"),
   "orderLineItems": [{
    "itemId": 55,
    "numberOrdered": 40
    }, {
    "itemId": 56,
    "numberOrdered": 41
   }],
   "preferredColor": "red"
  })
WriteResult({
 "nInserted": 0,
 "writeError": {
  "code": 121,
  "errmsg": "Document failed validation"
 }
})

Second, let’s try to update a compliant JSON document that already exists in the collection in a non-compliant way:

> db.orders.update({  
   "orderId": NumberInt(3) 
   }, {  
   "$set": {   
    "orderDate": "2018-01-09"  
   } 
  })

As expected the update fails:

WriteResult({
 "nMatched" : 0,
 "nUpserted" : 0,
 "nModified" : 0,
 "writeError" : {
  "code" : 121,
  "errmsg" : "Document failed validation"
 }
})

Third, let’s try to update a non-compliant JSON document

> db.orders.update({  
   "orderId": NumberInt(1) 
   }, {  
   "$set": {   
    "orderDate": "2018-01-10"  
   } 
  })

As per the above explanation of moderate this should work and indeed it does:

WriteResult({
 "nMatched": 1,
 "nUpserted": 0,
 "nModified": 1
})

Bypassing Validation

With the correct permission (https://docs.mongodb.com/manual/reference/privilege-actions/#bypassDocumentValidation) it is possible to bypass document validation.

This allows for the situation that e.g. a collection is governed by a new schema, however, existing application code might have to continue to insert or to update documents with a structure that violates the new schema as the logic cannot be adjusted to the new schema quickly enough (including transforming the non-compliant to compliant JSON documents).

Summary

The brief analysis of MongoDB wrt. document validation in context of JSON schemas added to collections in the last three blogs showed that while schema supervision is possible, it is not as strict as in relational database management systems.

Basically, if a schema is present, a user cannot infer that all documents in that collection comply to that schema. A schema related to a collection can be changed, and existing documents that would violate the new schema on insert will not be discarded from the collection. Furthermore, properties that are not covered by the schema can be added and changed freely.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Advertisement

SQL for JSON and Schema Support (Part 4): Intermezzo 2 – MongoDB’s $jsonschema

After some initial exploration in the previous blog, more aspects on MongoDB’s $jsonschema are looked at in the following.

Example

First, let’s create a collection as follows. It is governed by a schema, and validation is in the strictest setting (the following is based on MongoDB version 3.6.0).

> mongo
> use more_exploration
> db.createCollection("orders", {
  "validator": {
   "$jsonSchema": {
    "bsonType": "object",
    "required": ["orderId", "orderDate", "orderLineItems"],
    "properties": {
     "orderId": {
      "bsonType": "int",
      "description": "Order Identifier: must be of 
                     type int and is required"
     },
     "orderDate": {
      "bsonType": "date",
      "description": "Order Date: must be of 
                     type date and is required"
     },
     "orderLineItems": {
      "bsonType": "array",
      "items": {
       "bsonType": "object",
       "properties": {
        "itemId": {
         "bsonType": "int"
        },
        "numberOrdered": {
         "bsonType": "int"
        }
       }
      },
      "description": "Order Line Items: must be of 
                     type array and is required"
     }
    }
   }
  },
  "validationLevel": "strict",
  "validationAction": "error"
 })
{ "ok" : 1 }

The two documents from the example outlined in the initial blog of series are added next.

> db.orders.insert({
   "orderId": NumberInt(1),
   "orderDate": new Date("2017-09-30"),
   "orderLineItems": [{
     "itemId": NumberInt(55),
     "numberOrdered": NumberInt(20)
    },
    {
     "itemId": NumberInt(56),
     "numberOrdered": NumberInt(21)
    }
   ]
  })
WriteResult({ "nInserted" : 1 })
> db.orders.insert({
   "orderId": NumberInt(2),
   "orderDate": new Date("2017-09-30"),
   "orderLineItems": [{
     "itemId": NumberInt(55),
     "numberOrdered": NumberInt(30)
    },
    {
     "itemId": NumberInt(56),
     "numberOrdered": NumberInt(31)
    }
   ]
  })
WriteResult({ "nInserted" : 1 })

Insert Strictness and Partial Schema Coverage

The validator is in place on the collection “orders”. This can be verified with the command

> db.getCollectionInfos({name: "orders"})

Now let’s try and add a document that has additional properties in addition to those that comply with the schema as follows:

> db.orders.insert({
   "orderId": NumberInt(3),
   "orderDate": new Date("2017-09-30"),
   "orderLineItems": [{
     "itemId": NumberInt(55),
     "numberOrdered": NumberInt(40)
    },
    {
     "itemId": NumberInt(56),
     "numberOrdered": NumberInt(41)
    }
   ],
   "preferredColor": "red"
  })
WriteResult({ "nInserted" : 1 })

It appears that as long as the schema is satisfied, additional properties can be inserted. So the schema is not completely covering the object to be inserted, but only those properties that are defined in the schema (validator). It is a partial schema coverage.

Here is the counter example: the value of the property “orderLineItems” is not in compliance, and so the insertion fails:

> db.orders.insert({
   "orderId": NumberInt(4),
   "orderDate": new Date("2017-09-30"),
   "orderLineItems": ["b", "g"],
   "preferredColor": "red"
  })
WriteResult({
 "nInserted": 0,
 "writeError": {
  "code": 121,
  "errmsg": "Document failed validation"
 }
})

Update Strictness and Partial Schema Coverage

The following updates an existing document:

> db.orders.update({
   "orderId": NumberInt(2)
  }, {
   "$set": {
    "orderDate": new Date("2017-10-01")
   }
  })
WriteResult({
 "nMatched": 1,
 "nUpserted": 0,
 "nModified": 1
})

In part 1 of this blog series the order with identifier 1 was updated to add a property “specialInstructions”. This is not schema compliant, however, the update is possible as it does not violate that part of the document that is covered by the schema.

> db.orders.update({
   "orderId": NumberInt(1)
   }, {
   "$set": {
    "specialInstructions": "Drop of in front, 
                           not back of location"
   }
  })
WriteResult({
 "nMatched": 1,
 "nUpserted": 0,
 "nModified": 1
})

Partial schema coverage applies to update as well, not just to inserts.

An example of a non-compliant update is the following:

> db.orders.update({
   "orderId": NumberInt(2)
  }, {
   "$set": {
    "orderDate": "2017-09-30"
   }
  })
WriteResult({
 "nMatched": 0,
 "nUpserted": 0,
 "nModified": 0,
 "writeError": {
  "code": 121,
  "errmsg": "Document failed validation"
 }
})

Summary

MongoDB supports partial schema coverage in strict mode, meaning, properties defined in the schema must match the schema, however, properties not specified in the schema can be added or modified without rejection.

This means (again) that examining the JSON schema validator of a MongoDB collection only indicates properties common to all documents, but not the complete set of properties of all documents.

The next blog examines the non-strict validation setting of a JSON schema validator in MongoDB.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Schema-free Database (Part 1): An Oxymoron

The notion of a ‘schema-free database’ keeps coming up, most recently in a meetup I attended a few days ago. Some rationalization follows divided up into the categories of ‘document’ and ‘database’.

While a generalization is easily possible, the context here will be JSON and MongoDB as these are two practical implementations that are available and often used as examples of a ‘schema-free database’. Those provide a nice constraint technology set as an example, while the principles apply to a whole range of other technology, of course.

Document

A JSON document, in short: document, follows a set of construction principles outlined here: http://www.json.org/. This is a rather informal grammar that defines how a valid JSON document is constructed. There are no data type generators and so new data types cannot be introduced; therefore, every document is constructed from the fixed set of types enumerated on that web page.

Document Schema

An attempt has been made to create a more formal mechanism to define a schema for JSON documents: http://json-schema.org/. This approach provides a formal language to describe the schema of a JSON document explicitly.

The json-schema approach combined with the fixed set of types available to create a JSON document means that every JSON document can be described explicitly using json-schema without exceptions. This in turns means that every JSON document has at least an implicit schema, unless it is additionally made explicit with e.g. json-schema.

Therefore, JSON documents have a schema, an implicit one and optionally an explicit one. Depending on the particular schema definition language approach itself, a document might match more than one schema, but that is left for a separate discussion.

Set of Document Schemas

Given a set of JSON documents it is now possible to characterize their relationship to schemas. In the ‘best’ case, all documents follow the same schema; in the ‘worst’ case, each document follows its own schema. And there are cases in-between where a subset of the documents validates against a schema, and another subset against another schema. Depending on the design, one JSON document might validate against different schemas.

The relationship between documents and schemas is n:m in general.

Database

In context of a database, there are a few interesting questions in this context:

  • Does the database understand a document representation (e.g. JSON)?
  • Does the database enforce a document representation?

And:

  • Does the database enforce a schema?
  • Does the database understand a schema?

Let’s answer these questions for MongoDB specifically:

And:

  • MongoDB enforces a partial schema. Each document must have a property called “_id”; if the document being inserted does not have such a property, one is automatically added.
  • MongoDB does not understand an explicit schema as it does not provide for a mechanism to load a schema definition language.

MongoDB, however, understands implicit schemas as MongoDB does allow to e.g. create an index on any property of documents. So MongoDB recognizes properties.

Furthermore, MongoDB supports aggregation functions and supports e.g. the sum of properties across documents (https://realprogrammer.wordpress.com/2012/11/04/null-undefined-nan-and-missing-property-goto-considered-harmful-part-2/). So it is data type aware and implements operators (e.g. sum) on those.

Conclusion: The Notion of ‘Schema’ is Changing

This rather brief discussion clearly rationalizes that the label ‘schema-free database’ is not applicable to technologies such as those discussed in this blog (JSON/BSON, MongoDB).

Hence these technologies are not an example of ‘schema-free database’, to the contrary: they demonstrate that the notion of ‘schema’ can have a wider and more flexible interpretation then what relational databases bring forward.