SQL for JSON Rationalization Part 11: JSON Specific Predicates

JSON SQL cannot assume collections containing only homogeneous documents; instead, it must support schemaless documents. This requires JSON specific predicates that are discussed in this blog.

Example Collection

The documents of the following collection “predColl” are used in this blog as examples.

{"a":"b"}
{"a":{"c":1,"d":2},"e":[77,{"x":"eightyeight"}]}
{"a":{"c":1,"d":2},"e":["seventyseven",{"x":88}]}

Definition: Path

A (complete) path in a JSON document is the list of property names and array indexes from the root all the way to a leaf separated by “.”. For example, a complete path is

e.[1].x

A partial path in a JSON document is the list of property names and array indexes from the root to an intermediary property (that is not a leaf property) separated by “.”. For example, a partial path is

e.[1]

It is possible that a path is a complete path in one document and a partial path in another document. For example, a path that is complete and partial at the same time is

a

Each path leads up to a property that is of a specific JSON type. For example,

  • the type of e.[1].x is JSON_STRING in one of the example documents and JSON_NUMBER in another one.
  • the type of e.[1] is JSON_OBJECT in both cases.

Document: Set of Paths

Based on the above definition, a JSON document can be represented as the set of its paths.

The document

{"a":{"c":1,"d":2},"e":["seventyseven",{"x":88}]}

can be represented as

a
a.c
a.d
e.[0]
e.[1]
e.[1].x

The corresponding types are

JSON_OBJECT
JSON_NUMBER
JSON_NUMBER
JSON_STRING
JSON_OBJECT
JSON_NUMBER

Schemaless Documents / Schema-per-Document

Traditionally, databases enforced the definition of a schema and many continue to do so. This means that data can only be stored successfully in a database if the data complies with the schema at that point in time.

Some database systems do not enforce a schema and as a consequence the data is not constrained by a schema. In context of JSON documents, if a document is not constrained by a schema, then it can have any valid structure (in terms of the rules the JSON standard defines).

These documents are termed “schemaless” documents. In reality there is a schema, an implicit  (intentional) one, for each document, termed “schema-per-document”.

In concrete terms “schemaless” in context of a database system means:

  • Any pair of documents within a collection of a database can potentially have different sets of paths
  • The same (complete or partial) path in two different documents can lead to a value of a different JSON type

As a consequence, queries must be able to test for the existence of paths as well as for the existence of a specific JSON type as these cannot be assumed (in contrast to a database that enforces schemas).

JSON Specific Predicates: exists_path and is_of_type

Two predicates are needed in order to check for the existence of paths or JSON types of values.

  • exists_path <path>
  • <path> is_of_type <JSON type>

The query

select {*} from predColl where exists_path a.d

returns

{"a":{"c":1,"d":2},"e":[77,{"x":"eightyeight"}]}
{"a":{"c":1,"d":2},"e":["seventyseven",{"x":88}]}

The query

select {*} from predColl where e.[1].x is_of_type JSON_number

returns

{"a":{"c":1,"d":2},"e":["seventyseven",{"x":88}]}

Negation

Negation of the predicates is useful to determine if there are document that do not have specific paths or paths do not refer to properties of specific JSON types.

select {*} from predColl where not exists_path e.[1].x

returns

{"a":"b"}

The query

select {*} from predColl where not a is_of_type JSON_object

returns

{"a":"b"}

Definition: (Partial) Homogeneity

Queries using the predicates exists_path or is_of_type can be used to determine if all documents have the required or expected paths or if there are documents that are missing specific paths or have paths leading to unexpected JSON types.

With these predicates it is possible to determine if the documents in a collection are homogeneous wrt. given set of paths.

Dynamic Schema Check

A developer can now implement first checking path existence or JSON type compliance before executing business logic related queries. If the required paths or JSON types are missing in specific documents, appropriate error handling can be implemented for these.

From a different viewpoint, a developer now has the tools to dynamically check or select all documents complying to a schema that is “imagined” at the type of querying.

Summary

While schemaless documents are convenient from a development perspective, this requires in general the ability to check for the existence or absence of paths as well as JSON type compliance.

The predicates exists_path and is_of_type provide the necessary querying tools in order to test for the variations in schemaless JSON documents.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Advertisements