SQL for JSON and Schema Support (Part 5): Intermezzo 3 – MongoDB’s $jsonschema

The previous blog discussed MongoDB’s $jsonschema behavior with a strict validation level. Let’s look at the moderate validation level in this blog.

Example

As usual, first, let’s create a collection and add a few JSON documents to it. Afterwards a schema validation is added with the moderate setting (the following is based on MongoDB version 3.6.1).

> mongo
> use moderate_exploration

Initially, before adding a schema, two JSON objects are inserted that are not compliant with the schema that is going to be added afterwards. The reason is that we need non-compliant JSON objects to discuss the moderate level later.

> db.orders.insert({
   "orderId": 1,
   "orderDate": ISODate("2017-09-30T00:00:00Z"),
   "orderLineItems": [{
    "itemId": 55,
    "numberOrdered": 20
    }, {
    "itemId": 56,
    "numberOrdered": 21
   }],
   "specialInstructions": "Drop of in front, 
                           not back of location"
  })
WriteResult({ "nInserted" : 1 })
> db.orders.insert({
   "orderId": 2,
   "orderDate": ISODate("2017-09-30T00:00:00Z"),
   "orderLineItems": [{
    "itemId": 55,
    "numberOrdered": 40
    }, {
    "itemId": 56,
    "numberOrdered": 41
   }],
   "preferredColor": "red"
  })
WriteResult({ "nInserted" : 1 })

Now the schema is added:

> db.runCommand({ 
   "collMod": "orders",
   "validator": {  
    "$jsonSchema": {   
      "bsonType": "object",
       "required": ["orderId", "orderDate", "orderLineItems"],
       "properties": {
        "orderId": { 
         "bsonType": "int",
         "description": "Order Identifier: must be of 
                         type int and is required"
        },
        "orderDate": { 
         "bsonType": "date",
         "description": "Order Date: must be of 
                         type date and is required"
        },
        "orderLineItems": { 
         "bsonType": "array",
         "items": {  
          "bsonType": "object",
          "properties": {   
           "itemId": {    
           "bsonType": "int"   
           },
           "numberOrdered": {    
           "bsonType": "int"   
           }  
          } 
         },
         "description": "Order Line Items: must be of 
                         type array and is required"
      }   
     }  
    } 
   },
   "validationLevel": "moderate",
   "validationAction": "error"
  })
{ "ok" : 1 }

After the schema is added, two more JSON objects are inserted, this time being schema compliant.

> db.orders.insert({
   "orderId": NumberInt(3),
   "orderDate": ISODate("2017-09-30T00:00:00Z"),
   "orderLineItems": [{
    "itemId": NumberInt(55),
    "numberOrdered": NumberInt(60)
    }, {
    "itemId": NumberInt(56),
    "numberOrdered": NumberInt(61)
   }]
  })
WriteResult({ "nInserted" : 1 })
> db.orders.insert({
   "orderId": NumberInt(4),
   "orderDate": ISODate("2017-09-30T00:00:00Z"),
   "orderLineItems": [{
    "itemId": NumberInt(55),
    "numberOrdered": NumberInt(80)
    }, {
    "itemId": NumberInt(56),
    "numberOrdered": NumberInt(81)
   }]
  })
WriteResult({ "nInserted" : 1 })

At this point the created collection is governed by a schema, and contains four JSON documents, two are compliant with the schema (orderId 3 and 4), and two are not compliant (orderId 1 and 2).

Analysis

The MongoDB documentation states for “moderate”: “Apply validation rules to inserts and to updates on existing valid documents. Do not apply rules to updates on existing invalid documents.” (https://docs.mongodb.com/manual/reference/command/collMod/#validationLevel).

Let’s explore now the behavior of the moderate validation level.

First, let’s try to insert a non-compliant JSON document. The insert will fail as expected:

> db.orders.insert({
   "orderId": 5,
   "orderDate": ISODate("2017-09-30T00:00:00Z"),
   "orderLineItems": [{
    "itemId": 55,
    "numberOrdered": 40
    }, {
    "itemId": 56,
    "numberOrdered": 41
   }],
   "preferredColor": "red"
  })
WriteResult({
 "nInserted": 0,
 "writeError": {
  "code": 121,
  "errmsg": "Document failed validation"
 }
})

Second, let’s try to update a compliant JSON document that already exists in the collection in a non-compliant way:

> db.orders.update({  
   "orderId": NumberInt(3) 
   }, {  
   "$set": {   
    "orderDate": "2018-01-09"  
   } 
  })

As expected the update fails:

WriteResult({
 "nMatched" : 0,
 "nUpserted" : 0,
 "nModified" : 0,
 "writeError" : {
  "code" : 121,
  "errmsg" : "Document failed validation"
 }
})

Third, let’s try to update a non-compliant JSON document

> db.orders.update({  
   "orderId": NumberInt(1) 
   }, {  
   "$set": {   
    "orderDate": "2018-01-10"  
   } 
  })

As per the above explanation of moderate this should work and indeed it does:

WriteResult({
 "nMatched": 1,
 "nUpserted": 0,
 "nModified": 1
})

Bypassing Validation

With the correct permission (https://docs.mongodb.com/manual/reference/privilege-actions/#bypassDocumentValidation) it is possible to bypass document validation.

This allows for the situation that e.g. a collection is governed by a new schema, however, existing application code might have to continue to insert or to update documents with a structure that violates the new schema as the logic cannot be adjusted to the new schema quickly enough (including transforming the non-compliant to compliant JSON documents).

Summary

The brief analysis of MongoDB wrt. document validation in context of JSON schemas added to collections in the last three blogs showed that while schema supervision is possible, it is not as strict as in relational database management systems.

Basically, if a schema is present, a user cannot infer that all documents in that collection comply to that schema. A schema related to a collection can be changed, and existing documents that would violate the new schema on insert will not be discarded from the collection. Furthermore, properties that are not covered by the schema can be added and changed freely.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Advertisement

SQL for JSON and Schema Support (Part 4): Intermezzo 2 – MongoDB’s $jsonschema

After some initial exploration in the previous blog, more aspects on MongoDB’s $jsonschema are looked at in the following.

Example

First, let’s create a collection as follows. It is governed by a schema, and validation is in the strictest setting (the following is based on MongoDB version 3.6.0).

> mongo
> use more_exploration
> db.createCollection("orders", {
  "validator": {
   "$jsonSchema": {
    "bsonType": "object",
    "required": ["orderId", "orderDate", "orderLineItems"],
    "properties": {
     "orderId": {
      "bsonType": "int",
      "description": "Order Identifier: must be of 
                     type int and is required"
     },
     "orderDate": {
      "bsonType": "date",
      "description": "Order Date: must be of 
                     type date and is required"
     },
     "orderLineItems": {
      "bsonType": "array",
      "items": {
       "bsonType": "object",
       "properties": {
        "itemId": {
         "bsonType": "int"
        },
        "numberOrdered": {
         "bsonType": "int"
        }
       }
      },
      "description": "Order Line Items: must be of 
                     type array and is required"
     }
    }
   }
  },
  "validationLevel": "strict",
  "validationAction": "error"
 })
{ "ok" : 1 }

The two documents from the example outlined in the initial blog of series are added next.

> db.orders.insert({
   "orderId": NumberInt(1),
   "orderDate": new Date("2017-09-30"),
   "orderLineItems": [{
     "itemId": NumberInt(55),
     "numberOrdered": NumberInt(20)
    },
    {
     "itemId": NumberInt(56),
     "numberOrdered": NumberInt(21)
    }
   ]
  })
WriteResult({ "nInserted" : 1 })
> db.orders.insert({
   "orderId": NumberInt(2),
   "orderDate": new Date("2017-09-30"),
   "orderLineItems": [{
     "itemId": NumberInt(55),
     "numberOrdered": NumberInt(30)
    },
    {
     "itemId": NumberInt(56),
     "numberOrdered": NumberInt(31)
    }
   ]
  })
WriteResult({ "nInserted" : 1 })

Insert Strictness and Partial Schema Coverage

The validator is in place on the collection “orders”. This can be verified with the command

> db.getCollectionInfos({name: "orders"})

Now let’s try and add a document that has additional properties in addition to those that comply with the schema as follows:

> db.orders.insert({
   "orderId": NumberInt(3),
   "orderDate": new Date("2017-09-30"),
   "orderLineItems": [{
     "itemId": NumberInt(55),
     "numberOrdered": NumberInt(40)
    },
    {
     "itemId": NumberInt(56),
     "numberOrdered": NumberInt(41)
    }
   ],
   "preferredColor": "red"
  })
WriteResult({ "nInserted" : 1 })

It appears that as long as the schema is satisfied, additional properties can be inserted. So the schema is not completely covering the object to be inserted, but only those properties that are defined in the schema (validator). It is a partial schema coverage.

Here is the counter example: the value of the property “orderLineItems” is not in compliance, and so the insertion fails:

> db.orders.insert({
   "orderId": NumberInt(4),
   "orderDate": new Date("2017-09-30"),
   "orderLineItems": ["b", "g"],
   "preferredColor": "red"
  })
WriteResult({
 "nInserted": 0,
 "writeError": {
  "code": 121,
  "errmsg": "Document failed validation"
 }
})

Update Strictness and Partial Schema Coverage

The following updates an existing document:

> db.orders.update({
   "orderId": NumberInt(2)
  }, {
   "$set": {
    "orderDate": new Date("2017-10-01")
   }
  })
WriteResult({
 "nMatched": 1,
 "nUpserted": 0,
 "nModified": 1
})

In part 1 of this blog series the order with identifier 1 was updated to add a property “specialInstructions”. This is not schema compliant, however, the update is possible as it does not violate that part of the document that is covered by the schema.

> db.orders.update({
   "orderId": NumberInt(1)
   }, {
   "$set": {
    "specialInstructions": "Drop of in front, 
                           not back of location"
   }
  })
WriteResult({
 "nMatched": 1,
 "nUpserted": 0,
 "nModified": 1
})

Partial schema coverage applies to update as well, not just to inserts.

An example of a non-compliant update is the following:

> db.orders.update({
   "orderId": NumberInt(2)
  }, {
   "$set": {
    "orderDate": "2017-09-30"
   }
  })
WriteResult({
 "nMatched": 0,
 "nUpserted": 0,
 "nModified": 0,
 "writeError": {
  "code": 121,
  "errmsg": "Document failed validation"
 }
})

Summary

MongoDB supports partial schema coverage in strict mode, meaning, properties defined in the schema must match the schema, however, properties not specified in the schema can be added or modified without rejection.

This means (again) that examining the JSON schema validator of a MongoDB collection only indicates properties common to all documents, but not the complete set of properties of all documents.

The next blog examines the non-strict validation setting of a JSON schema validator in MongoDB.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON and Schema Support (Part 3): Intermezzo 1 – MongoDB’s $jsonschema

MongoDB introduced support for JSON Schema through $jsonschema. Let’s explore this new functionality a bit in this blog.

$jsonschema

The functionality is introduced here: https://docs.mongodb.com/master/reference/operator/query/jsonSchema/#op._S_jsonSchema It states “$jsonSchema can be used in a document validator, which enforces that inserted or updated documents are valid against the schema.”

A first item to note is that this approach is supporting BSON types (http://bsonspec.org/, https://docs.mongodb.com/master/reference/operator/query/type/), not just JSON structures (https://www.json.org/), using a specific property “bsonType” that is not part of the JSON Schema standard (http://json-schema.org/).

A second observation is that the schema specification is inline with the collection creation and cannot refer to a separate JSON schema file or JSON object representing a JSON schema.

JSON Schema Validator Example

Let’s use the example of the first blog in this series, create a schema for it and use that as a constraint for the “orders” collection. Then documents are added to the collection (and there seem to be errors as well). For reference the version used is: MongoDB server version: 3.6.0.

> mongo
> use schema_exploration
> db.createCollection("orders", {
  "validator": {
   "$jsonSchema": {
    "bsonType": "object",
    "required": ["orderId", "orderDate", "orderLineItems"],
    "properties": {
     "orderId": {
      "bsonType": "int",
      "description": "Order Identifier: must be of 
                     type int and is required"
     },
     "orderDate": {
      "bsonType": "date",
      "description": "Order Date: must be of 
                     type date and is required"
     },
     "orderLineItems": {
      "bsonType": "array",
      "items": {
       "bsonType": "string"
      },
      "description": "Order Line Items: must be of 
                     type array and is required"
     }
    }
   }
  }
 })
{ "ok" : 1 }

A quick note: “bsonType” can be used in all levels in order to refer to BSON types, not just on the top level.

> db.orders.insert({
  "orderId": NumberInt(1),
  "orderDate": new Date("2017-09-30"),
  "orderLineItems": [{
   "itemId": 55,
   "numberOrdered": 30
  }, {
   "itemId": 56,
   "numberOrdered": 31
  }]
 })
WriteResult({
 "nInserted": 0,
 "writeError": {
  "code": 121,
  "errmsg": "Document failed validation"
 }
})

Along the way I ran into a validation issue as I constraint the array elements to strings, rather than objects, as used in the example of the first blog in this series. So I made a schema definition mistake.

To note is that the response on the shell does not indicate what the problem was making debugging hard, especially when large and complex schemas are to be debugged.

> db.orders.insert({
  "orderId": NumberInt(1),
  "orderDate": new Date("2017-09-30"),
  "orderLineItems": ["a", "b"]
 })
WriteResult({
 "nInserted": 1
})

Once I realized the mistake I made, I inserted a document complying to the schema in order to make sure I identified the issue correctly.

JSON Schema Validator Update

Obviously, after defining a wrong schema, the correct schema should be used as validator.

This is the correct schema:

{
 "bsonType": "object",
 "required": ["orderId", "orderDate", "orderLineItems"],
 "properties": {
  "orderId": {
   "bsonType": "int",
   "description": "Order Identifier: must be of 
                  type int and is required"
  },
  "orderDate": {
   "bsonType": "date",
   "description": "Order Date: must be of 
                  type date and is required"
  },
  "orderLineItems": {
   "bsonType": "array",
   "items": {
    "bsonType": "object",
    "properties": {
     "itemId": {
      "bsonType": "int"
     },
     "numberOrdered": {
      "bsonType": "int"
     }
    }
   },
   "description": "Order Line Items: must be of 
                  type array and is required"
   }
  }
 }

And this is the command to update the validator:

> db.runCommand({
  "collMod": "orders",
  "validator": {
   "$jsonSchema": {
    "bsonType": "object",
    "required": ["orderId", "orderDate", "orderLineItems"],
    "properties": {
     "orderId": {
      "bsonType": "int",
      "description": "Order Identifier: must be of 
                     type int and is required"
     },
     "orderDate": {
      "bsonType": "date",
      "description": "Order Date: must be of 
                     type date and is required"
     },
     "orderLineItems": {
      "bsonType": "array",
      "items": {
       "bsonType": "object",
       "properties": {
        "itemId": {
         "bsonType": "int"
        },
        "numberOrdered": {
         "bsonType": "int"
        }
       }
      },
      "description": "Order Line Items: must be of 
                     type array and is required"
     }
    }
   }
  },
  "validationLevel": "strict"
 })
{ "ok" : 1 }

Some background on the command used is here: https://docs.mongodb.com/master/reference/command/collMod/.

Following is an attempt to add one more of the (now mismatching) documents:

> db.orders.insert({
  "orderId": NumberInt(1),
  "orderDate": new Date("2017-09-30"),
  "orderLineItems": ["a", "b"]
 })
WriteResult({
 "nInserted": 0,
 "writeError": {
  "code": 121,
  "errmsg": "Document failed validation"
 }
})

As it should be, the insert fails.

And here the insert of a now correct document:

> db.orders.insert({
  "orderId": NumberInt(1),
  "orderDate": new Date("2017-09-30"),
  "orderLineItems": [{
   "itemId": NumberInt(55),
   "numberOrdered": NumberInt(20)
  }, {
   "itemId": NumberInt(56),
   "numberOrdered": NumberInt(21)
  }]
 });
WriteResult({
 "nInserted": 1
})

Collection Inconsistency: Mismatch of Schema and Documents

There is an interesting issue appearing at this point. The new schema does not match all existing documents in the collection. Or the other way around: the collection now contains documents that do not match that schema.

> db.orders.find()
{
 "_id": ObjectId("5a2022c3fb460d15db9ec73e"),
 "orderId": 1,
 "orderDate": ISODate("2017-09-30T00:00:00Z"),
 "orderLineItems": ["a", "b"]
} {
 "_id": ObjectId("5a202322fb460d15db9ec741"),
 "orderId": 1,
 "orderDate": ISODate("2017-09-30T00:00:00Z"),
 "orderLineItems": [{
  "itemId": 55,
  "numberOrdered": 20
 }, {
  "itemId": 56,
  "numberOrdered": 21
 }]
}

MongoDB did not flag that there are documents in the collection that will not match the new schema (even though the validation level strict was used).

Adding a validation action with value of “error” does not change the situation, either.

Implication to Semantics

Given that the schema of a collection can be changed at any time, and given that MongoDB does not fail the schema update based on mismatching documents already in the collection, examining the schema is insufficient to understand the structure of the documents in a collection.

So a collection with a schema does not ensure that all documents in that collection are schema compliant. It rather insures that from the point in time the schema was added or updated documents will have to comply. Previous documents in the collections are not affected.

Summary

The notion of “schema” in context of MongoDB is very different from the notion of “schema” in context of relational database management systems. In MongoDB the documents in the collection do not have to comply to the schema; they only do have to comply at time of insertion.

There will be more exploration coming up in the next blog on this topic in order to further understand the semantics of “schema” in context of MongoDB.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.