SQL for JSON Rationalization Part 12: SERVER-831

There is an almost infamous bug in context of MongoDB, called SERVER-831. This blog puts SERVER-831 in context of JSON SQL.

Upfront

In order to not loose the context, the bug and one of the use cases were captured as PDFs and are attached here:

Anatomy of SERVER-831

Server 831 has two main aspects: a selection combined with an update. In the following only the selection is discussed; the update part follows down the road when update in context of JSON SQL will be introduced.

Basically, the selection selects based on the value of array elements without knowing the precise array index where the values are located. The array index is not part of the selection criteria, only the content of array elements.

Data Set

These are the documents in the collection server831 used for illustration in this blog:

{"a":[5,4,3,2,1]}
{"a":[5,10,15,20,25]}
{"a":[1,2,3,4,5]}
{"a":[{"_id":7},{"_id":8}]}
{"a":[{"_id":8},{"_id":7}]}
{"a":[null,[0,0,7],null]}
{"a":[true,false],"b":true}
{"a":[true,false],"b":false}
{"a":[true,[null],false]}

Predicating Array Elements by Array Index

So far it was possible to create a selection based on the array index. For example, the following query returns all documents that contain an array “a” where the value at index 0 is 5.

select {*} from server831 where a.[0] = 5

returns

{"a":[5,4,3,2,1]}
{"a":[5,10,15,20,25]}

Predicating Array Elements by Content

Sometimes there is the situation where the array index is unknown and the client wants to retrieve all documents where at least one array element has a specific value. Referring to every array element is denoted as [*] in the JSON SQL syntax (instead of referring to a specific array index).

For example, the following query selects all documents where at least one of the array elements of “a” has the value 5.

select {*} from server831 where a.[*] = 5

results in

{"a":[5,4,3,2,1]}
{"a":[5,10,15,20,25]}
{"a":[1,2,3,4,5]}

More interesting queries are the following examples that reach into the value(s) of array elements:

select {*} from server831 where a.[*]._id = 7

returns

{"a":[{"_id":7},{"_id":8}]}
{"a":[{"_id":8},{"_id":7}]}

Not only scalar literals can be used, but non-scalar literals as well.

select {*} from server831 where a.[*] = [0,0,7]

returns

{"a":[null,[0,0,7],null]}

Selection does not have to be based on literals:

select {*} from server831 where a.[*] = b

returns

{"a":[true,false],"b":true}
{"a":[true,false],"b":false}

The latter allows comparison of values inside documents.

Any number of levels can be referenced, for example

select {*} from server831 where a.[*].[*] = null

returns

{"a":[true,[null],false]}

SERVER-831 Use Case

The document provided in one of the use cases in SERVER-831 is as follows (with cleaned up syntax errors):

{ "_id": 1, 
  "name": "Dave Gahan", 
  "medications": [ 
  { "_id": 23, 
    "name": "Dilaudid", 
    "type": "Rx", 
    "prescriptions": [ 
    { "_id": 13, 
      "quantity": 60, 
      "started": "2009-01-01" }, 
    { "_id": 77, 
      "quantity": 45, 
      "started": "2009-02-01" } ] }, 
  { "_id": 41, 
    "name": "Oxycodone", 
    "type": "Rx" } ]}

The corresponding query in form of JSON SQL is (assuming the data is stored in collection uc831):

select {*} 
from uc831 
where _id = 1 
      and medications.[*]._id = 23 
      and medications.[*].prescriptions.[*]._id = 77

Summary

Selection based on array element values without the knowledge of the array index is an extremely powerful query feature of JSON SQL and is most likely useful in many different use cases.

The syntax of JSON SQL had to be modified only minimally without any restriction or loss of query expressiveness.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Advertisement

SQL for JSON Rationalization Part 11: JSON Specific Predicates

JSON SQL cannot assume collections containing only homogeneous documents; instead, it must support schemaless documents. This requires JSON specific predicates that are discussed in this blog.

Example Collection

The documents of the following collection “predColl” are used in this blog as examples.

{"a":"b"}
{"a":{"c":1,"d":2},"e":[77,{"x":"eightyeight"}]}
{"a":{"c":1,"d":2},"e":["seventyseven",{"x":88}]}

Definition: Path

A (complete) path in a JSON document is the list of property names and array indexes from the root all the way to a leaf separated by “.”. For example, a complete path is

e.[1].x

A partial path in a JSON document is the list of property names and array indexes from the root to an intermediary property (that is not a leaf property) separated by “.”. For example, a partial path is

e.[1]

It is possible that a path is a complete path in one document and a partial path in another document. For example, a path that is complete and partial at the same time is

a

Each path leads up to a property that is of a specific JSON type. For example,

  • the type of e.[1].x is JSON_STRING in one of the example documents and JSON_NUMBER in another one.
  • the type of e.[1] is JSON_OBJECT in both cases.

Document: Set of Paths

Based on the above definition, a JSON document can be represented as the set of its paths.

The document

{"a":{"c":1,"d":2},"e":["seventyseven",{"x":88}]}

can be represented as

a
a.c
a.d
e.[0]
e.[1]
e.[1].x

The corresponding types are

JSON_OBJECT
JSON_NUMBER
JSON_NUMBER
JSON_STRING
JSON_OBJECT
JSON_NUMBER

Schemaless Documents / Schema-per-Document

Traditionally, databases enforced the definition of a schema and many continue to do so. This means that data can only be stored successfully in a database if the data complies with the schema at that point in time.

Some database systems do not enforce a schema and as a consequence the data is not constrained by a schema. In context of JSON documents, if a document is not constrained by a schema, then it can have any valid structure (in terms of the rules the JSON standard defines).

These documents are termed “schemaless” documents. In reality there is a schema, an implicit  (intentional) one, for each document, termed “schema-per-document”.

In concrete terms “schemaless” in context of a database system means:

  • Any pair of documents within a collection of a database can potentially have different sets of paths
  • The same (complete or partial) path in two different documents can lead to a value of a different JSON type

As a consequence, queries must be able to test for the existence of paths as well as for the existence of a specific JSON type as these cannot be assumed (in contrast to a database that enforces schemas).

JSON Specific Predicates: exists_path and is_of_type

Two predicates are needed in order to check for the existence of paths or JSON types of values.

  • exists_path <path>
  • <path> is_of_type <JSON type>

The query

select {*} from predColl where exists_path a.d

returns

{"a":{"c":1,"d":2},"e":[77,{"x":"eightyeight"}]}
{"a":{"c":1,"d":2},"e":["seventyseven",{"x":88}]}

The query

select {*} from predColl where e.[1].x is_of_type JSON_number

returns

{"a":{"c":1,"d":2},"e":["seventyseven",{"x":88}]}

Negation

Negation of the predicates is useful to determine if there are document that do not have specific paths or paths do not refer to properties of specific JSON types.

select {*} from predColl where not exists_path e.[1].x

returns

{"a":"b"}

The query

select {*} from predColl where not a is_of_type JSON_object

returns

{"a":"b"}

Definition: (Partial) Homogeneity

Queries using the predicates exists_path or is_of_type can be used to determine if all documents have the required or expected paths or if there are documents that are missing specific paths or have paths leading to unexpected JSON types.

With these predicates it is possible to determine if the documents in a collection are homogeneous wrt. given set of paths.

Dynamic Schema Check

A developer can now implement first checking path existence or JSON type compliance before executing business logic related queries. If the required paths or JSON types are missing in specific documents, appropriate error handling can be implemented for these.

From a different viewpoint, a developer now has the tools to dynamically check or select all documents complying to a schema that is “imagined” at the type of querying.

Summary

While schemaless documents are convenient from a development perspective, this requires in general the ability to check for the existence or absence of paths as well as JSON type compliance.

The predicates exists_path and is_of_type provide the necessary querying tools in order to test for the variations in schemaless JSON documents.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 9: Restriction – Arrays

This installment reviews restriction in JSON SQL based on JSON array literals (all other JSON types except JSON array have been discussed in previous blogs).

JSON Object Notation

JSON SQL follows the JSON object notation as defined in the JSON standard. An empty JSON array is denoted as [] and a non-empty JSON array has one or more comma separated values, including JSON array.

A JSON array literal is either an empty JSON array or a non-empty JSON array. A JSON array literal is not enclosed in quotes. The only JSON literal enclosed in quotes is JSON string. If a JSON array is enclosed in quotes then it is not a JSON array, but a JSON string.

Sample Document Set

The following document set is used in this blog and the documents are stored in a collection called “arrayColl”.

select {*} from arrayColl

results in

{"one":[{"a":1},{"b":2}]}
{"one":"[{\"a\": 1}, {\"b\": 2}]"}
{"three":[{"b":[{"c":null},{"d":true}]}]}
{"four":[{"x":8,"y":9}]}
{"five":[]}

Restriction based on JSON Object Literal

Starting with the empty JSON array literal, the following two queries product the same result.

select {*} from arrayColl where five = []

and

select {*} from arrayColl where [] = five

result in

{"five":[]}

In the following, queries show the JSON array literal on the right side of the operator, however, it can be on either side.

Operators = And <>

The operators = and <> are defined for a JSON array literal. JSON SQL regards two JSON arrays as equal if both have the same (equal) values in the same order; and not equal otherwise.

The query (restriction using JSON array literal)

select {*} from arrayColl where one = [{"a": 1}, {"b": 2}]

returns

{"one":[{"a":1},{"b":2}]}

The query (restriction using JSON string literal(!))

select {*} from arrayColl where one = '[{"a": 1}, {"b": 2}]'

returns

{"one":"[{\"a\": 1}, {\"b\": 2}]"}

A restriction can reach into the JSON array as well using the path notation. The query

select {*}
from arrayColl
where three.[0].b = [{"c": null}, {"d": true}]

returns

{"three":[{"b":[{"c":null},{"d":true}]}]}

Operators <, >, <= And >=

The operators <, >, <= and >= could be defined recursively for convenience with some restrictions. For example, a JSON array could be considered less than another JSON array if both have the same values and if the corresponding values are less than another.

However, JSON true, JSON false and JSON null would not be able to participate in the operator <, >, <= or >=, only JSON object, JSON array, JSON number and JSON string.

Those four operators are currently not directly implemented in JSON SQL since it is possible to achieve the same by writing a complex conjunctive restriction (details on this approach will be discussed in a subsequent blog as well as strategies of what to do if any of JSON true, JSON false or JSON null are present).

Canonical Interpretation

The order of the values inside a JSON array is significant, but not within a JSON object. The query

select {*} from arrayColl where four = [{"y": 9, "x": 8}]

therefore results in

{"four":[{"x":8,"y":9}]}

Summary

Restriction by JSON array is provided by JSON SQL without problem and the syntax extends the Relational SQL syntax naturally.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 8: Restriction – Objects

This installment reviews restriction in JSON SQL based on JSON object literals (all other JSON types except JSON array have been discussed in previous blogs).

JSON Object Notation

JSON SQL follows the JSON object notation as defined in the JSON standard. An empty JSON object is denoted as {} and a non-empty JSON object has one or more comma separated pairs (a pair is a tuple of string and JSON type separated by a colon – also referred to as property).

A JSON object literal is either an empty JSON object or a non-empty JSON object. A JSON object literal is not enclosed in quotes. The only JSON literal enclosed in quotes is JSON string. If a JSON object is enclosed in quotes then it is not a JSON object, but a JSON string.

Sample Document Set

The following document set is used in this blog and the documents are stored in a collection called “objectColl”.

select {*} from objectColl

results in

{"one": {"a": 1}}
{"one": "{\"a\": 1}"}
{"three": {"b": {"c": null}}}
{"four": {"x": 8, "y": 9}}
{"five": {}}

Restriction based on JSON Object Literal

Starting with the empty JSON object literal, the following two queries product the same result.

select {*} from objectColl where five = {}

and

select {*} from objectColl where {} = five

result in

{"five": {}}

In the following, queries show the JSON object literal on the right side of the operator, however, it can be on either side.

Operators = And <>

The operators = and <> are defined for a JSON object literal. JSON SQL regards two JSON objects as equal if both have the same pairs (recursively), in any order; and not equal otherwise.

The query (restriction using JSON object literal)

select {*} from objectColl where one = {"a": 1}

returns

{"one": {"a": 1}}

The query (restriction using JSON string literal(!))

select {*} from objectColl where one = '{"a": 1}'

returns

{"one": "{\"a\": 1}"}

A restriction can reach into the JSON object as well using the path notation. The query

select {*} from object Coll where three.b = {"c": null}

returns

{"three": {"b": {"c": null}}}

Operators <, >, <= And >=

The operators <, >, <= and >= could be defined recursively for convenience with some restrictions. For example, a JSON object could be considered less than another JSON object if both have the same pairs and if the values of the corresponding pairs are less than another.

However, JSON true, JSON false and JSON null would not be able to participate in the operator <, >, <= or >=, only JSON object, JSON array, JSON number and JSON string.

Those four operators are currently not directly implemented in JSON SQL since it is possible to achieve the same by writing a complex conjunctive restriction (details on this approach will be discussed in a subsequent blog as well as strategies of what to do if any of JSON true, JSON false or JSON null are present).

Canonical Interpretation

The order of the pairs inside a JSON object is not significant (according to the JSON standard). The query

select {*} from objectColl where four = {"y": 9, "x": 8}

therefore results in

{"four": {"x": 8, "y": 9}}

Summary

Restriction by JSON object is provided by JSON SQL without problem and the syntax extends the Relational SQL syntax naturally.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 7: Restriction – True, False and Null

The last blog (Part 6) introduced the general notion of restriction and focused on JSON String and JSON Number. This blog will extend the discussion covering JSON true, JSON false and JSON null.

Sample Data Set

The following set of documents stored in the collection “boolcoll” is used in this blog:

select {*} from boolcoll

returns

{"a":true}
{"a":false}
{"true":false}
{"true":"null"}

Textual Representation

Even though the JSON standard defines that JSON true, JSON false and JSON null are all lowercase, in context of JSON SQL all combinations of upper and lower case characters are permitted:

select {*} from boolcoll where a = TruE

returns

{"a":true}

Operators

Not all of the default operators are defined on JSON true, JSON false or JSON null. Defined are = and <>, undefined are <, >, <= and >=. If one of the undefined operators is used in conjunction with JSON true, JSON false or JSON null, a semantic query analysis error is returned before the query is executed.

Restriction Syntax

As in the last blog, the property name can be on either side of the operator. The following two queries return the same result:

select {*} from boolcoll where a <> false

and

select {*}  from boolcoll where false <> a

return

{"a":true}

Execution Semantics

If the specified property is present, the restriction is evaluated and the document is added to the result set if it fulfills the restriction. If the property is not present, no evaluation and consequently no inclusion into the result set takes place.

select {*} from boolcoll where null = null

returns all documents as null = null is always true.

Disambiguation

As shown in the example data set for this blog, it is possible that a property name is “true”, “false” or “null”. So far only the short form of property names was used with JSON SQL queries, i.e., property names without being enclosed in double quotes (contrary to the JSON standard definition).

However, as soon as property names can be the same as keywords like JSON true, JSON false or JSON null, disambiguation has to take place. This is accomplished by following the JSON standard: enclosing the property name in double quotes. For example,

select {*} from boolcoll where "true" = false

returns all documents with a property name of “true” that has the value JSON false.

{"true":false}

Since Relational SQL does not use double quotes, there cannot be any confusion:

Along the same lines,

select {*} from boolcoll where "true" = 'null'

returns all documents where the property “true” has the value ‘null’ (string).

{"true":"null"}

This is not ambiguous, either, as Relational SQL uses single quotes to denote String literals.

Needless to say that double quotes can be used outside a disambiguation. For example, one of the above queries could be specified as

select {*} from boolcoll where "a" = TruE

returning

{"a":true}

Double quotes can be used in the projection clause as well.

select "a" from boolcoll where "a" = tRUe

returns

|a                        |
+-------------------------+
|true                     |

Summary

JSON true, JSON false and JSON null can be used in JSON SQL queries without restriction and in a well-defined way. Disambiguation is not interfering with either the syntax as defined by the JSON standard, or the regular Relational SQL syntax. Great!

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 6: Restriction – General Discussion

After discussing projection, selection is up next in the blog series on SQL for JSON. This first blog on selection focuses on the scalar JSON types Number and String.

Demo Data

As usual, we start with demo data. The collection for this blog is called “selcoll” (for SelectionCollection) and contains the following documents:

select {*} from selcoll

returns

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}
{"a":{"b":"25"},"c":["foo1","foobar","bar"],"d":{"e":"foo"}}
{"a":{"b":25},"c":["foo2","foo2bar2","ba\"r"],"d":{"e":"foo2"}}

Selection based on Literals

Selection is following the regular Relational SQL syntax and is straightforward. For this discussion only single predicates are shown, not (complex) Boolean expressions of predicates. Boolean expression of predicates follow the usual semantics and do not require a lot of discussion.

select {*} from selcoll where a.b = 25

This JSON SQL statement selects all documents from selcoll that have a property “a” and a property “b” within a sub-document of “a” with the value of 25.

The result is

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}
{"a":{"b":25},"c":["foo2","foo2bar2","ba\"r"],"d":{"e":"foo2"}}

The following selection has the same semantics and returns the same result:

select {*} from selcoll where 25 = a.b

A selection based on a String literal follows the same syntax:

select {*} from selcoll where c.[1] = 'foobar'

This returns

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}
{"a":{"b":"25"},"c":["foo1","foobar","bar"],"d":{"e":"foo"}}

And the following selection returns the same result:

select {*} from selcoll where 'foobar' = c.[1]

In this context a note is in order. JSON uses double quotes as string delimiter, not single quotes, as SQL does. In order to stay as near as possible to Relational SQL, single quotes are used and transformed into double quotes by the underlying implementation.

Selection based on Value Comparison

It is possible to relate two different values within a document as well (aka, not a self-join that would related values of different documents – this will be discussed in a later blog).

select d.e as de, c.[0] as c0 from selcoll where d.e = c.[0]

This query selects all documents that have the same value in d.e and c.[0]. As added benefit the query projects to those two values as well.

The result is

|de                       |c0                       |
+-------------------------+-------------------------+
|"foo"                    |"foo"                    |
|"foo2"                   |"foo2"                   |

Any path can be related to any other path without restriction.

While in this blog only numbers and strings are discussed, the above discussed types of restrictions will work for all JSON data types in general, including true, false, null, objects and arrays (discussed in subsequent blogs).

Operations

The usual operators are defined: <, >, <>, =, >=, and <=. The semantics of these is defined for Number and String (Relational SQL semantics is taken). For the other JSON types they will have to be defined as the other JSON types do not have a corresponding Relational SQL domain.

Beyond these operators more “interesting” operations are required. For example

select {*} from selcoll where c contains 'foobar'

whereby “c” refers to a JSON array (and possibly a JSON object). This predicate would be true if there is an element in “c” that is of type String and the value of that element is “foobar”. There is a whole set of interesting operations that will be discussed at some point later as well.

Semantics

As implicitly demonstrated above, a JSON document is only in the result set if (a) the path to the value as specified in the JSON SQL query is present and (b) the value is the value as indicated in the selection clause in JSON SQL.

If the path does not exist or the value does not have a matching value, no result is returned for that document (and not the empty document itself).

There is no implicit type transformation implemented. This means that a Number literal only matches number values, and a String literal only matches string values.

Syntax Twists

Syntax has always a twist, especially if different languages are combined. In this case one of the twists is the single quote. A single quote within a string is represented as two single quotes in Relational SQL. JSON SQL does not have that requirement since strings are delimited by double quotes in JSON and a single quote is treated as regular character. The reverse situation exists also: double quotes have to be escaped within a string in JSON, but not in Relational SQL.

The query (double quote, not escaped in JSON SQL)

select {*} from selcoll where c.[2] = 'ba"r'

returns

{"a":{"b":25},"c":["foo2","foo2bar2","ba\"r"],"d":{"e":"foo2"}}

And the query (two single quotes, escaped in JSON SQL)

select {*} from selcoll where c.[2] = 'ba''r'

returns

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}

Summary

Restriction (or selection) is almost straightforward for the types Number and String in context of JSON SQL. The only twist is the way Relational SQL and JSON SQL differ in denoting String literals as well as encode special characters.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.