SQL for JSON and Schema Support (Part 7): Partial Schemas for JSON Documents

This blog concludes the series on JSON schema support in context of databases with a proposal: partial schemas.

Black and White: Rigorous Schema or No Schema At All

So far the discussion around JSON schema support in databases was a black and white (or a all-or-nothing) discussion, aka, a JSON document can either change at any time or it must comply completely with a rigorous schema (structure, data types and possibly values) at any time.

The any-change-at-any-time situation is the “wild west” in terms of structure and data type variation in the sense that code accessing JSON documents has to “protect” itself by e.g. means of assertions. Code has to assume that any structure can be present and therefore has to introspect each document at every access in order to determine if it can process it. Complex error handling is necessary if the code cannot process a specific JSON document (as discussed in an earlier blog).

More than All-Or-Nothing Approach: Rationale for Reconsideration

The ability to change JSON documents randomly is portrait as a desirable and positive feature by many JSON supporting databases, however, from an engineering perspective this “randomness” is adding significant engineering and coding complexity (and significant effort to quality assurance as well).

Are random (“dynamic”) document structures really the 80% case in real situations? Or is in reality a more nuanced situation at hand?

Many documents implement the content and state of business data (functional data). In most cases there are basic data sets to be supported. For example, purchase orders must have an overall order value field, health records must have a unique record identifier including some basic patient data, user accounts must have a unique account identifier and at least an account login name. There is usually a core set of attributes that represents the commonly agreed set of business data and those are mandatory as well in order for the business data to be meaningful.

Each (business) domain has its own commonly agreed core semantics and therefore commonly agreed data attributes. Code should be able to assume that the core semantics is complied to and that the core attributes are present in every JSON document representing the domain semantics.

So maybe it is not it really all-or-nothing in terms of the dynamic structure of JSON documents all the time. Maybe it would be really useful to be able to specify a schema that applies to (only) a part of a JSON document at any time without enforcing complete compliance: partial schemas.

Partial Schema Rationalization

If a JSON document must be partially compliant, then one part of it must comply to a given partial schema, and it can have any form outside of the part governed by the partial schema.

For example, a partial schema for a purchase order could enforce that a purchase order must have an purchase order identifier, a buyer identifier, a set of line items, and a purchase order amount at any point in time. A purchase order could contain additional supplemental data that are not governed by the schema (the dynamic part).

With a rigorous partial schema in place code is guaranteed that the attributes defined by the schema are present. This makes the executions of assertions unnecessary and code can be sure that all documents are compliant with the partial schema.

Optionally a partial schema could define that some attributes must not always be present, but in case they are present, they have to comply to a given schema.

For example, if the full specification of a buyer is provided, not just the buyer identifier, then it must contain the buyer’s legal name, the legally registered address as well as a phone number that reaches the buyer. Similar for a shipping address: if one is provided, it has to be a full shipping address denoting a physical location.

With such a schema in place, code can check if specific optional items are present, and if they are, their schema is guaranteed, meaning, these items comply with their definitions in the partial schema.

Finally, a partial schema does not govern any structure outside the mandatory or optional items and it is possible for a JSON document to contain additional data that is not enforced in terms of presence or structure.

For example, items not enforced by a schema could be supporting documentation for line items like drawings, notes to shippers asking for specific handling, or a trustworthiness rating of a buyer in a purchase order. A user account could have a history of logins or a health record could contain the list of doctors that were involved in treatments over time.

For data items that are not enforced, code would have to use assertions in order to establish their presence (and their data types).

Partial Schema Specification

How would a partial schema specification look like? In principle, it could be an extension what has been discussed in earlier blogs of this series with a different semantics. First, any schema would be considered partial, meaning, a JSON document can contain data items that are not contained in the schema in addition to those specified by the partial schema.

Second, each specification in the schema has a “mandatory” and an “optional” designation defining the mandatory items, as well as the optional items. The following schema shows the possible extensions. There are two designations, “mandatory” and “optional” that indicate what the partial schema governs.

{
  "type": "object",
  "mandatory": [
    "orderId",
    "orderDate",
    "orderLineItems"
  ],
  "optional": [
    "buyerId"
  ],
  "properties": {
    "orderId": {
      "type": "number",
      "description": "Order Identifier: must be of 
                      type number and is mandatory"
    },
    "orderDate": {
      "type": "string",
      "description": "Order Date: must be of 
                      type string and is mandatory"
    },
    "orderLineItems": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "itemId": {
            "type": "number"
          },
          "numberOrdered": {
            "type": "number"
          }
        }
      },
      "description": "Order Line Items: must be of 
                      type array and is mandatory"
    },
    "buyer": {
      "type": "number",
      "description": "Buyer Identifier: must be of 
                      type number and is optional"
    }
  }
}

The above schema enforces the structure of JSON documents in part and specifies mandatory as well as optional attributes. There are three mandatory attributes and one optional attribute. Such a schema could be added to an identifiable collection of JSON documents ensuring thereby their partial compliance.

A change in a partial schema cannot take place without modifying the JSON documents as well since the JSON documents must be compliant to a partial schema at any time. Again, code is assured compliance of JSON documents with the attributes defined by the partial schema and code therefore does not have to protect itself.

Summary

This conclusion of the blog series on JSON schema support in databases provided a proposal for partial schemas. Partial schemas support structure and data type guarantees for code as well as support the freedom for dynamic schema changes outside the part of a JSON document that has to be compliant with a partial schema.

Following this approach

  • Code is guaranteed that JSON documents comply to a partial schema
  • An application has the freedom to add additional data dynamically not governed by the partial schema
  • Over time, as dynamic structures in a JSON document stabilize, those can be moved into the stable partial schema part of a JSON document.

Partial schemas are a win-win for both, databases supporting JSON documents as well as code accessing these databases.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Advertisement