SQL for JSON Rationalization Part 11: JSON Specific Predicates

JSON SQL cannot assume collections containing only homogeneous documents; instead, it must support schemaless documents. This requires JSON specific predicates that are discussed in this blog.

Example Collection

The documents of the following collection “predColl” are used in this blog as examples.

{"a":"b"}
{"a":{"c":1,"d":2},"e":[77,{"x":"eightyeight"}]}
{"a":{"c":1,"d":2},"e":["seventyseven",{"x":88}]}

Definition: Path

A (complete) path in a JSON document is the list of property names and array indexes from the root all the way to a leaf separated by “.”. For example, a complete path is

e.[1].x

A partial path in a JSON document is the list of property names and array indexes from the root to an intermediary property (that is not a leaf property) separated by “.”. For example, a partial path is

e.[1]

It is possible that a path is a complete path in one document and a partial path in another document. For example, a path that is complete and partial at the same time is

a

Each path leads up to a property that is of a specific JSON type. For example,

  • the type of e.[1].x is JSON_STRING in one of the example documents and JSON_NUMBER in another one.
  • the type of e.[1] is JSON_OBJECT in both cases.

Document: Set of Paths

Based on the above definition, a JSON document can be represented as the set of its paths.

The document

{"a":{"c":1,"d":2},"e":["seventyseven",{"x":88}]}

can be represented as

a
a.c
a.d
e.[0]
e.[1]
e.[1].x

The corresponding types are

JSON_OBJECT
JSON_NUMBER
JSON_NUMBER
JSON_STRING
JSON_OBJECT
JSON_NUMBER

Schemaless Documents / Schema-per-Document

Traditionally, databases enforced the definition of a schema and many continue to do so. This means that data can only be stored successfully in a database if the data complies with the schema at that point in time.

Some database systems do not enforce a schema and as a consequence the data is not constrained by a schema. In context of JSON documents, if a document is not constrained by a schema, then it can have any valid structure (in terms of the rules the JSON standard defines).

These documents are termed “schemaless” documents. In reality there is a schema, an implicit  (intentional) one, for each document, termed “schema-per-document”.

In concrete terms “schemaless” in context of a database system means:

  • Any pair of documents within a collection of a database can potentially have different sets of paths
  • The same (complete or partial) path in two different documents can lead to a value of a different JSON type

As a consequence, queries must be able to test for the existence of paths as well as for the existence of a specific JSON type as these cannot be assumed (in contrast to a database that enforces schemas).

JSON Specific Predicates: exists_path and is_of_type

Two predicates are needed in order to check for the existence of paths or JSON types of values.

  • exists_path <path>
  • <path> is_of_type <JSON type>

The query

select {*} from predColl where exists_path a.d

returns

{"a":{"c":1,"d":2},"e":[77,{"x":"eightyeight"}]}
{"a":{"c":1,"d":2},"e":["seventyseven",{"x":88}]}

The query

select {*} from predColl where e.[1].x is_of_type JSON_number

returns

{"a":{"c":1,"d":2},"e":["seventyseven",{"x":88}]}

Negation

Negation of the predicates is useful to determine if there are document that do not have specific paths or paths do not refer to properties of specific JSON types.

select {*} from predColl where not exists_path e.[1].x

returns

{"a":"b"}

The query

select {*} from predColl where not a is_of_type JSON_object

returns

{"a":"b"}

Definition: (Partial) Homogeneity

Queries using the predicates exists_path or is_of_type can be used to determine if all documents have the required or expected paths or if there are documents that are missing specific paths or have paths leading to unexpected JSON types.

With these predicates it is possible to determine if the documents in a collection are homogeneous wrt. given set of paths.

Dynamic Schema Check

A developer can now implement first checking path existence or JSON type compliance before executing business logic related queries. If the required paths or JSON types are missing in specific documents, appropriate error handling can be implemented for these.

From a different viewpoint, a developer now has the tools to dynamically check or select all documents complying to a schema that is “imagined” at the type of querying.

Summary

While schemaless documents are convenient from a development perspective, this requires in general the ability to check for the existence or absence of paths as well as JSON type compliance.

The predicates exists_path and is_of_type provide the necessary querying tools in order to test for the variations in schemaless JSON documents.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Advertisement

SQL for JSON Rationalization Part 10: Complex Restriction

Until this blog only single restrictions were used; here is the discussion of complex restrictions in JSON SQL.

Example Data Set

In the following the collection “comrescoll” (for complex restriction collection) is used and it contains the following JSON documents:

{"a":{"c":"foo"},"b":[true,false,null]}
{"a":{"c":"foo"}}
{"b":[true,false,null]}

Single Restriction Semantics

As defined in a previous blog, a document only participates in a single restriction if the referenced properties are present.

For example,

select {*} 
from comrescoll 
where a.c = 'foo'

returns

{"a":{"c":"foo"},"b":[true,false,null]}
{"a":{"c":"foo"}}

as only those two object have the property “a.c” defined.

As the blog will show, a complex restriction applies to a document if all properties requiring evaluation are present.

Note that this previous sentence was really carefully worded. The following would not be correct: a complex restriction applies to a document if all properties that are referenced in the complex restriction are present.

The reason will be discussed later in detail.

AND

In an ANDed complex restriction, all properties that require evaluation have to be present in the document. A document will be included into the result set if the complex restriction evaluates to true for the document. The following query shows this case:

select {*} 
from comrescoll 
where a.c = 'foo' and b.[1] = false

returns

{"a":{"c":"foo"},"b":[true,false,null]}

… WHERE true, … WHERE false

A restriction always resulting in true can be implemented like this:

select {*} 
from comrescoll 
where 5 = 5

No complex restriction is required to accomplish this.

A restriction always resulting in false can be implemented like this:

select {*} 
from comrescoll 
where a.c = ‘foo’ and a.c = ‘bar’

However, an easier way is

select {*} 
from comrescoll 
where 5 = 6

This case does not require a complex restriction either.

OR

The OR operator behaves as expected:

select {*} 
from comrescoll 
where a.c = 'foo' or b.[1] = false

returns

{"a":{"c":"foo"},"b":[true,false,null]}
{"a":{"c":"foo"}}
{"b":[true,false,null]}

In case of a disjunction not all paths have to actually be present for a correct execution:

select {*} 
from comrescoll 
where d.[1] = false or a.c = 'foo'

returns

{"a":{"c":"foo"},"b":[true,false,null]}
{"a":{"c":"foo"}}

Even though the path “d.[1]” is not present, the evaluation takes place and is correct as the absence of a path does not necessarily fail the execution in a disjunction if this part of the complex restriction is unnecessary.

To force the presence of the path “d.[1]”, a predicate like “exists_path” would have to be added like this:

select {*} 
from comrescoll 
where d.[1] = false or a.c = 'foo' and exists_path d.[1]

This is not implemented yet, but it will be down the road at some point in time.

NOT

The operator NOT behaves as expected. The query

select {*} 
from comrescoll 
where not a.c = 'foo' and b.[1] = false

results in

{"b":[true,false,null]}

This might be surprising initially as the result contains a document that does not contain the path “a.c”. However, the absence of the path “a.c” means that “a.c” does not have the value ‘foo’.

Like above, the presence of “a.c” would be enforced by a conjunction of exists_path “a.c”.

Combination of AND, OR, NOT

The operators AND, OR and NOT can be combined as expected according to the SQL 92 standard. For example, the query

select {*} 
from comrescoll 
where not a.c = '' and not b.[0] = 0 or not b.[1] = 1

results in

{"a":{"c":"foo"},"b":[true,false,null]}
{"a":{"c":"foo"}}
{"b":[true,false,null]}

Caveat

Remember the “careful wording” from earlier? Consider the following query:

select {*} 
from comrescoll 
where a.c = 'foo' and b.[1] = false or d = 6 or d <> 6

On the surface of it, the paths “a.c”, “b.[1]” and “d” need to be present for the complex restriction to be evaluated. However, this query returns the following result:

{"a":{"c":"foo"},"b":[true,false,null]}

Upon closer inspection, the two restrictions involving the path “d” are insignificant. The path “d” is not needed for the evaluation even though the path “d” is referred to in the complex restriction.

Another example is

select {*} 
from comrescoll 
where a.c = 'foo' and b.[1] = false or (d = 6 and d <> 6)

resulting in

{"a":{"c":"foo"},"b":[true,false,null]}

What if the path “d” should be present? In this case a predicate like exists_path is necessary (to be discussed in one of the future blogs). With the additional predicate exists_path only those documents are included in the evaluation that actually contain that path.

Summary

Complex restrictions in JSON SQL are very similar to Relational SQL in their syntax and semantics except for the difference that paths that are not required for the evaluation of a complex restriction do not have to be present in the documents even though they are mentioned in the complex restriction.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

 

SQL for JSON Rationalization Part 9: Restriction – Arrays

This installment reviews restriction in JSON SQL based on JSON array literals (all other JSON types except JSON array have been discussed in previous blogs).

JSON Object Notation

JSON SQL follows the JSON object notation as defined in the JSON standard. An empty JSON array is denoted as [] and a non-empty JSON array has one or more comma separated values, including JSON array.

A JSON array literal is either an empty JSON array or a non-empty JSON array. A JSON array literal is not enclosed in quotes. The only JSON literal enclosed in quotes is JSON string. If a JSON array is enclosed in quotes then it is not a JSON array, but a JSON string.

Sample Document Set

The following document set is used in this blog and the documents are stored in a collection called “arrayColl”.

select {*} from arrayColl

results in

{"one":[{"a":1},{"b":2}]}
{"one":"[{\"a\": 1}, {\"b\": 2}]"}
{"three":[{"b":[{"c":null},{"d":true}]}]}
{"four":[{"x":8,"y":9}]}
{"five":[]}

Restriction based on JSON Object Literal

Starting with the empty JSON array literal, the following two queries product the same result.

select {*} from arrayColl where five = []

and

select {*} from arrayColl where [] = five

result in

{"five":[]}

In the following, queries show the JSON array literal on the right side of the operator, however, it can be on either side.

Operators = And <>

The operators = and <> are defined for a JSON array literal. JSON SQL regards two JSON arrays as equal if both have the same (equal) values in the same order; and not equal otherwise.

The query (restriction using JSON array literal)

select {*} from arrayColl where one = [{"a": 1}, {"b": 2}]

returns

{"one":[{"a":1},{"b":2}]}

The query (restriction using JSON string literal(!))

select {*} from arrayColl where one = '[{"a": 1}, {"b": 2}]'

returns

{"one":"[{\"a\": 1}, {\"b\": 2}]"}

A restriction can reach into the JSON array as well using the path notation. The query

select {*}
from arrayColl
where three.[0].b = [{"c": null}, {"d": true}]

returns

{"three":[{"b":[{"c":null},{"d":true}]}]}

Operators <, >, <= And >=

The operators <, >, <= and >= could be defined recursively for convenience with some restrictions. For example, a JSON array could be considered less than another JSON array if both have the same values and if the corresponding values are less than another.

However, JSON true, JSON false and JSON null would not be able to participate in the operator <, >, <= or >=, only JSON object, JSON array, JSON number and JSON string.

Those four operators are currently not directly implemented in JSON SQL since it is possible to achieve the same by writing a complex conjunctive restriction (details on this approach will be discussed in a subsequent blog as well as strategies of what to do if any of JSON true, JSON false or JSON null are present).

Canonical Interpretation

The order of the values inside a JSON array is significant, but not within a JSON object. The query

select {*} from arrayColl where four = [{"y": 9, "x": 8}]

therefore results in

{"four":[{"x":8,"y":9}]}

Summary

Restriction by JSON array is provided by JSON SQL without problem and the syntax extends the Relational SQL syntax naturally.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 8: Restriction – Objects

This installment reviews restriction in JSON SQL based on JSON object literals (all other JSON types except JSON array have been discussed in previous blogs).

JSON Object Notation

JSON SQL follows the JSON object notation as defined in the JSON standard. An empty JSON object is denoted as {} and a non-empty JSON object has one or more comma separated pairs (a pair is a tuple of string and JSON type separated by a colon – also referred to as property).

A JSON object literal is either an empty JSON object or a non-empty JSON object. A JSON object literal is not enclosed in quotes. The only JSON literal enclosed in quotes is JSON string. If a JSON object is enclosed in quotes then it is not a JSON object, but a JSON string.

Sample Document Set

The following document set is used in this blog and the documents are stored in a collection called “objectColl”.

select {*} from objectColl

results in

{"one": {"a": 1}}
{"one": "{\"a\": 1}"}
{"three": {"b": {"c": null}}}
{"four": {"x": 8, "y": 9}}
{"five": {}}

Restriction based on JSON Object Literal

Starting with the empty JSON object literal, the following two queries product the same result.

select {*} from objectColl where five = {}

and

select {*} from objectColl where {} = five

result in

{"five": {}}

In the following, queries show the JSON object literal on the right side of the operator, however, it can be on either side.

Operators = And <>

The operators = and <> are defined for a JSON object literal. JSON SQL regards two JSON objects as equal if both have the same pairs (recursively), in any order; and not equal otherwise.

The query (restriction using JSON object literal)

select {*} from objectColl where one = {"a": 1}

returns

{"one": {"a": 1}}

The query (restriction using JSON string literal(!))

select {*} from objectColl where one = '{"a": 1}'

returns

{"one": "{\"a\": 1}"}

A restriction can reach into the JSON object as well using the path notation. The query

select {*} from object Coll where three.b = {"c": null}

returns

{"three": {"b": {"c": null}}}

Operators <, >, <= And >=

The operators <, >, <= and >= could be defined recursively for convenience with some restrictions. For example, a JSON object could be considered less than another JSON object if both have the same pairs and if the values of the corresponding pairs are less than another.

However, JSON true, JSON false and JSON null would not be able to participate in the operator <, >, <= or >=, only JSON object, JSON array, JSON number and JSON string.

Those four operators are currently not directly implemented in JSON SQL since it is possible to achieve the same by writing a complex conjunctive restriction (details on this approach will be discussed in a subsequent blog as well as strategies of what to do if any of JSON true, JSON false or JSON null are present).

Canonical Interpretation

The order of the pairs inside a JSON object is not significant (according to the JSON standard). The query

select {*} from objectColl where four = {"y": 9, "x": 8}

therefore results in

{"four": {"x": 8, "y": 9}}

Summary

Restriction by JSON object is provided by JSON SQL without problem and the syntax extends the Relational SQL syntax naturally.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 7: Restriction – True, False and Null

The last blog (Part 6) introduced the general notion of restriction and focused on JSON String and JSON Number. This blog will extend the discussion covering JSON true, JSON false and JSON null.

Sample Data Set

The following set of documents stored in the collection “boolcoll” is used in this blog:

select {*} from boolcoll

returns

{"a":true}
{"a":false}
{"true":false}
{"true":"null"}

Textual Representation

Even though the JSON standard defines that JSON true, JSON false and JSON null are all lowercase, in context of JSON SQL all combinations of upper and lower case characters are permitted:

select {*} from boolcoll where a = TruE

returns

{"a":true}

Operators

Not all of the default operators are defined on JSON true, JSON false or JSON null. Defined are = and <>, undefined are <, >, <= and >=. If one of the undefined operators is used in conjunction with JSON true, JSON false or JSON null, a semantic query analysis error is returned before the query is executed.

Restriction Syntax

As in the last blog, the property name can be on either side of the operator. The following two queries return the same result:

select {*} from boolcoll where a <> false

and

select {*}  from boolcoll where false <> a

return

{"a":true}

Execution Semantics

If the specified property is present, the restriction is evaluated and the document is added to the result set if it fulfills the restriction. If the property is not present, no evaluation and consequently no inclusion into the result set takes place.

select {*} from boolcoll where null = null

returns all documents as null = null is always true.

Disambiguation

As shown in the example data set for this blog, it is possible that a property name is “true”, “false” or “null”. So far only the short form of property names was used with JSON SQL queries, i.e., property names without being enclosed in double quotes (contrary to the JSON standard definition).

However, as soon as property names can be the same as keywords like JSON true, JSON false or JSON null, disambiguation has to take place. This is accomplished by following the JSON standard: enclosing the property name in double quotes. For example,

select {*} from boolcoll where "true" = false

returns all documents with a property name of “true” that has the value JSON false.

{"true":false}

Since Relational SQL does not use double quotes, there cannot be any confusion:

Along the same lines,

select {*} from boolcoll where "true" = 'null'

returns all documents where the property “true” has the value ‘null’ (string).

{"true":"null"}

This is not ambiguous, either, as Relational SQL uses single quotes to denote String literals.

Needless to say that double quotes can be used outside a disambiguation. For example, one of the above queries could be specified as

select {*} from boolcoll where "a" = TruE

returning

{"a":true}

Double quotes can be used in the projection clause as well.

select "a" from boolcoll where "a" = tRUe

returns

|a                        |
+-------------------------+
|true                     |

Summary

JSON true, JSON false and JSON null can be used in JSON SQL queries without restriction and in a well-defined way. Disambiguation is not interfering with either the syntax as defined by the JSON standard, or the regular Relational SQL syntax. Great!

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 6: Restriction – General Discussion

After discussing projection, selection is up next in the blog series on SQL for JSON. This first blog on selection focuses on the scalar JSON types Number and String.

Demo Data

As usual, we start with demo data. The collection for this blog is called “selcoll” (for SelectionCollection) and contains the following documents:

select {*} from selcoll

returns

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}
{"a":{"b":"25"},"c":["foo1","foobar","bar"],"d":{"e":"foo"}}
{"a":{"b":25},"c":["foo2","foo2bar2","ba\"r"],"d":{"e":"foo2"}}

Selection based on Literals

Selection is following the regular Relational SQL syntax and is straightforward. For this discussion only single predicates are shown, not (complex) Boolean expressions of predicates. Boolean expression of predicates follow the usual semantics and do not require a lot of discussion.

select {*} from selcoll where a.b = 25

This JSON SQL statement selects all documents from selcoll that have a property “a” and a property “b” within a sub-document of “a” with the value of 25.

The result is

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}
{"a":{"b":25},"c":["foo2","foo2bar2","ba\"r"],"d":{"e":"foo2"}}

The following selection has the same semantics and returns the same result:

select {*} from selcoll where 25 = a.b

A selection based on a String literal follows the same syntax:

select {*} from selcoll where c.[1] = 'foobar'

This returns

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}
{"a":{"b":"25"},"c":["foo1","foobar","bar"],"d":{"e":"foo"}}

And the following selection returns the same result:

select {*} from selcoll where 'foobar' = c.[1]

In this context a note is in order. JSON uses double quotes as string delimiter, not single quotes, as SQL does. In order to stay as near as possible to Relational SQL, single quotes are used and transformed into double quotes by the underlying implementation.

Selection based on Value Comparison

It is possible to relate two different values within a document as well (aka, not a self-join that would related values of different documents – this will be discussed in a later blog).

select d.e as de, c.[0] as c0 from selcoll where d.e = c.[0]

This query selects all documents that have the same value in d.e and c.[0]. As added benefit the query projects to those two values as well.

The result is

|de                       |c0                       |
+-------------------------+-------------------------+
|"foo"                    |"foo"                    |
|"foo2"                   |"foo2"                   |

Any path can be related to any other path without restriction.

While in this blog only numbers and strings are discussed, the above discussed types of restrictions will work for all JSON data types in general, including true, false, null, objects and arrays (discussed in subsequent blogs).

Operations

The usual operators are defined: <, >, <>, =, >=, and <=. The semantics of these is defined for Number and String (Relational SQL semantics is taken). For the other JSON types they will have to be defined as the other JSON types do not have a corresponding Relational SQL domain.

Beyond these operators more “interesting” operations are required. For example

select {*} from selcoll where c contains 'foobar'

whereby “c” refers to a JSON array (and possibly a JSON object). This predicate would be true if there is an element in “c” that is of type String and the value of that element is “foobar”. There is a whole set of interesting operations that will be discussed at some point later as well.

Semantics

As implicitly demonstrated above, a JSON document is only in the result set if (a) the path to the value as specified in the JSON SQL query is present and (b) the value is the value as indicated in the selection clause in JSON SQL.

If the path does not exist or the value does not have a matching value, no result is returned for that document (and not the empty document itself).

There is no implicit type transformation implemented. This means that a Number literal only matches number values, and a String literal only matches string values.

Syntax Twists

Syntax has always a twist, especially if different languages are combined. In this case one of the twists is the single quote. A single quote within a string is represented as two single quotes in Relational SQL. JSON SQL does not have that requirement since strings are delimited by double quotes in JSON and a single quote is treated as regular character. The reverse situation exists also: double quotes have to be escaped within a string in JSON, but not in Relational SQL.

The query (double quote, not escaped in JSON SQL)

select {*} from selcoll where c.[2] = 'ba"r'

returns

{"a":{"b":25},"c":["foo2","foo2bar2","ba\"r"],"d":{"e":"foo2"}}

And the query (two single quotes, escaped in JSON SQL)

select {*} from selcoll where c.[2] = 'ba''r'

returns

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}

Summary

Restriction (or selection) is almost straightforward for the types Number and String in context of JSON SQL. The only twist is the way Relational SQL and JSON SQL differ in denoting String literals as well as encode special characters.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 5: Projection – Specific Functionality

The last blog introduced SQL JSON projection and this blog will discuss some of its finer points.

Demo Data

Here are the JSON documents from the collection “tinycoll” used throughout this blog:

select {*} from tinycoll

returns two JSON documents:

{"a": 5,
 "b": {"c": 10,"d": 11},
 "c": [101, 102, {"d": 103}, {"e": 104}]}

{"a": 5,
 "b2": [10, 11],
 "c": [101, 102, {"d": 103}, {"e": 104}]}

AS Clause

In Relational SQL it is possible to rename columns. The AS clause is the means to do this and it contains an alternative column name. Example:

select a as abc from tinycoll

The result contains a column called “abc” instead of “a” and this is standard Relational SQL semantics.

|abc                      |
+-------------------------+
|5                        |
|5                        |

What does an AS clause mean in context of JSON SQL? In context of JSON SQL an AS clause specifies a path. Example:

select {a as x.y} from tinycoll

The result contains documents with paths “x.y” that contain the value of the corresponding “a” in the original document (if “a” is present).

{"x":{"y":5}}
{"x":{"y":5}}

Fundamentally, it means that the value the original path “a” pointed to is now at a new path “x.y” and that can be seen as relocation that only takes place in the result document. Any valid path is possible in the AS clause.

So far the AS clause supports renaming as well as relocation. Relocation is orthogonal and does not affect the original document. For example, the following relocations are valid:

select {a as b, b as a} from tinycoll

Basically, the values are exchanged between the two paths “a” and “b” (which can be more complex paths, of course).

{"a":{"c":10,"d":11},"b":5}
{"b":5}

All AS clauses are applied independently of each other, not in sequence (and therefore “a” and “b” do not contain the same value because of this projection specification).

A final situation is overwriting, meaning, the path in the AS clause can be that of an existing path in a JSON document and that will overwrite the value in the result document. For example:

select {a as c.[0]} from tinycoll

The existing value of “c.[0]” is overwritten and contains the value of “a” in the result document if “a” exists in the original document.

{"c":[5]}
{"c":[5]}

There are a few language constraints that are checked for. These are

  • Path Subsumption. A path in an AS clause must not be a subpath in any other path; otherwise one AS clause might conflict with another one. An example for a violation is: “select {a as c.[2].d, b as c.[2]} from tinycoll”. This is analogous to Relational SQL not allowing the use of the same column name in two different AS clauses.
  • Asterisk Query. An asterisk query cannot have an AS clause; if any change is necessary by means of an AS clause, the paths have to be listed explicitly.
  • Relational Output Path. The path in an AS clause for relation output must be a single value (path of length one) in order to comply to the Relational SQL semantics/model.

Value Non-Existence

The AS clause might create a path in a result document that does not exist in the original document. For example:

select {a as x.[2]} from tinycoll

In this example, the original document does not have an array named “x”. However, the result document is going to have one if “a” is present. The path sets the value of “a” to the third array element, however, the first and second element do not have a value as those elements do not exist. The result of the query is

{"x":["<>","<>",5]}
{"x":["<>","<>",5]}

The JSON standard does not have a notation for an absent value, however, it is needed in order to describe accurately that values are undefined. Therefore, the symbol “<>” is introduced of type String in order to (a) denote that a value is undefined and to (b) represent it as a known data type so that JSON libraries can process it.

“<>” is randomly defined; it can be changed to another symbol as necessary. JSON null cannot be used as JSON null is a valid constant (aka, explicit JSON value) and in contrast to SQL null does not denote “unknown”. The use of JSON null might suggest that there is the value of JSON null, when in reality there is no value at all. Any trailing “<>” are removed and not present in the output JSON documents.

Array Element Replacement

It is possible to replace array elements selectively, for example:

select {a as c.[0], b as c.[1], c.[2]} from tinycoll

will result in

{"c":[5,{"c":10,"d":11},{"d":103}]}
{"c":[5,"<>",{"d":103}]}

A shortcut syntax like c.[2..9] that refers to the 3rd until 10th elements inclusive is not supported at this point, but could be for convenience. If implemented at some point in time, then this section will be changed.

Likewise for a shortcut syntax like c.[..9], c.[2..] or c.[..] indicating all elements including the 10th, all starting with the 3rd, or all element respectively.

Additional Items

The “select distinct” clause is not specifically discussed as it has the intended semantics based on the JSON document equality definition.

An interesting case on projection is the mixed case, aka, some projection is relational, some other asks for the JSON form. For example,

select a, b, {c, d} from tinycoll

returns relational output, but with certain columns containing JSON objects that can be freely composed from one or more paths. This might be convenient from a final output viewpoint for the client, but would not contribute in major ways to a JSON SQL language definition. Therefore, it is not implemented as of now (and in case this decision changes, this section will be updated in the future).

Summary

Projection in context of JSON SQL is not all that straightforward compared to the Relational SQL semantics. This blog highlighted the most important finer points like the AS clause and array processing and outlined some of the additional possible extensions to an implementation.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 4: Projection – General Functionality

After the demo in the last blog (Part 3) it is time to discuss some of the assumptions and the projection functionality in more detail – here and in the next blog.

Assumptions: Array Start Index, JSON Literals and JSON Value Equality

The JSON standard does not define the starting index of the first array element. The assumption made here is that the first index is 0 (zero).

The JSON standard requires the literals “null”, “true” and “false” to be lower case. However, the assumption made here is that all lower as well as upper case combinations work, e.g., “True”, for convenience.

Another aspect the JSON standard does not define is equality on JSON values. There are many ways to define when two JSON values are equal. Here equality is defined on the string representation of the JSON values that contain no white space and where the property names in JSON objects are sorted alphabetically.

Definitions: Full and Partial Path

A full path is a sequence of property names as well as array indexes from the root of a JSON document all the way to one of its leaves. The elements of a path are separated by “.”. For example, “c.[3].e” is a full path from the root “c” to the leaf “e” in one of the demo documents of the previous blog. A path must start with a property name and cannot start with an array index. A path cannot be empty and the minimum path consists of a single property name.

Using “.” as separator is a random choice, but made by many systems. Having array indexes enclosed in “[” and “]” is customary also. Denoting an array index as separate path element (aka, enclosed in “.”) is also a convenient choice.

Given a JSON object, a full path might exist within it or not. Given a JSON object and a path, using the path as an access structure identifies a value only if the full path exists in the JSON document. If the path does not exist within the JSON document then no value is identified; especially not the JSON literal “null”.

A partial path is a sequence of property names and array indexes starting at the root, but not necessarily ending at a leaf, but at an intermediary property or array index. This supports “reaching into” a JSON document and identifying a JSON value that is between the root and a leaf.

Like in case of full paths, given a JSON object, a partial path might or might not exist within it. A partial path only identifies a JSON value if the partial path exists within a JSON object. In this case it identifies a composite JSON value.

If a JSON document has only scalar properties, then the root properties are the leaf properties at the same time. Paths in this context are full paths and partial paths cannot exist.

Projection

Unlike in the relational model, in context of the JSON model the result of a query can be returned as a relational table, or as a set of JSON documents. The choice is made by the query author.

The projection in a select statement contains one or more (full or partial) paths. If the paths are enclosed by a “{“ and “}” then JSON documents are returned, otherwise a table  (the asterisk projection is discussed below).

For example, the query from the previous blog

select a, b.c, d.[3].e from tinycoll

returns a table with three columns.

Semantically, each path in the projection will be a separate column. Each document from the collection “tinycoll” is taken and a corresponding row is added to the table. For each path of the projection that is found in the document the value is added to the row. If a path does not exist, no value is added in the column corresponding to the path. Therefore, a row can have values in every column, in some columns, or in no column, depending if the paths exist in the document.

As in relational SQL, the order of the paths matters as the corresponding columns will be created in that order.

The column names are created from the paths by replacing “.” in the path representation with “_” as many relational systems do not support “.” as column names.

The query

select {a, b.c, d.[3].e} from tinycoll

returns a set of JSON documents.

Semantically, each document from the collection “tinycoll” is taken and an empty result document is created for it. Each of the paths from the projection are followed within the document from the collection. If a value is found, the path with the corresponding value is added to the result document. It is possible that the document from the collection contains all, some, or none of the paths from the projection. Therefore, the result document might contain all, some, or none of the paths (empty document).

The order of the paths in the projection does not matter as JSON documents are created and order of properties / paths is not defined for JSON objects.

As a note, according to the construction principle of the result JSON documents, the paths in the projection of the select statement and the paths in the result JSON documents are exactly the same (if they exist). No translation is necessary from the viewpoint of the client between the paths in the query and the paths in the result documents.

Asterisk Projection

The asterisk projection is supported. The query

select {*} from tinycoll

returns all documents stored in the collection “tinycoll” as they are without any modification.

The query

select * from tinycoll

Returns a table that has any many columns as there are full and partial paths into all documents of the collection “tinycoll”.

Semantically, each document from the collection “tinycoll” is taken and a row is created for it. For each full as well as partial path in the document the value is retrieved and put into the corresponding column of the row. There is a column for each possible path and the set of columns can be predetermined or dynamically added to the result table as needed. As before, the column names are the path with the “.” replaced by “_”.

Summary

This was a first closer look into the details of projection in context of JSON SQL and the next blog will continue the project discussion. The key take away is that JSON SQL can return JSON documents as well as tables based on a well-defined execution semantics centered around JSON paths.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

 

SQL for JSON Rationalization Part 3: Demo

The previous blog outlined an initial glimpse at a JSON SQL query language and how it works when applied to JSON documents. In the following, a demo shows a concrete implementation.

Command Line Interface

The command line interface provides a few basic commands as follows:

JQDR> help
JQDR - JSON Query Done Right
Commands:
    help
    exit
    executequery <JSON SQL query>
    load <table name> <file name>
    deleteload <table name> <file name>
    createtable <table name>
    droptable <table name>
    deletetable <table name>
    existstable <table name>
    listtables
    <JSON SQL query>
JQDR> 

Document Collection

The following very small collection “tinycoll” contains two documents:

{"a": 5,
 "b": {"c": 10, "d": 11},
 "c": [101, 102, {"d": 103}, {"e": 104}]
}
{"a": 5,
 "b2": [10, 11],
 "c": [101, 102, {"d": 103}, {"e": 104}]
}

Loading those into the database is accomplished by adding the documents into a file and then load the documents from the file into the database after the table “tinycoll” has been created:

JQDR> load tinycoll src/test/resources/blog/tinycoll.txt
JQDR> 

Currently the language does not support an insert statement, however, this is in the plans.

select {*}

A first JSON query is to select all the documents and output those as documents (requested by the { and } in the projection clause):

JQDR> select {*} from tinycoll
{"a":5,"b":{"c":10,"d":11},"c":[101,102,{"d":103},{"e":104}]}
{"a":5,"b2":[10,11],"c":[101,102,{"d":103},{"e":104}]}
JQDR>

select {a, b.c, c.[3].e}

A more interesting query is a projection that reaches into the documents:

JQDR> select {a, b.c, c.[3].e} from tinycoll
{"a":5,"b":{"c":10},"c":["<>","<>","<>",{"e":104}]}
{"a":5,"c":["<>","<>","<>",{"e":104}]}
JQDR> 

A few observations are:

  • Full paths are inserted into the result documents. This allows to access the result documents using the same paths that were used in the projection (aka, a, b.c and c.[3].e).
  • JSON does not have a “value does not exist” representation. Therefore, the JSON query processor inserts “<>” for array values that do not exist, but need to be present in order to provide correct array indexes.

For example, only the 4th array element was projected, so the first three must be part of the result, but represented as “values does not exist” as they were not requested in the projection. In a Relational SQL world SQL NULL would have been used in order to represent “values does not exist” (or is unknown).

select *

This query selects all documents, but the result is in relational table format, not JSON documents (as the { and } are omitted in the projection). Each path into any of the documents of the collection is represented as a separate column. The following shows the resulting 14 columns:

JQDR> select * from tinycoll
|a    |b_c  |b_d  |b               |b2_[0] |b2_[1] |b2      |c_[0] |c_[1] |c_[2]_d |c_[2]     |c_[3]_e |c_[3]     |c                             |
+-----+-----+-----+----------------+-------+-------+--------+------+------+--------+----------+--------+----------+------------------------------+
|5    |10   |11   |{"c":10,"d":11} |<>     |<>     |<>      |101   |102   |103     |{"d":103} |104     |{"e":104} |[101,102,{"d":103},{"e":104}] |
|5    |<>   |<>   |<>              |10     |11     |[10,11] |101   |102   |103     |{"d":103} |104     |{"e":104} |[101,102,{"d":103},{"e":104}] |
JQDR>

It is important to note that the column names are default names generated by the JSON SQL query processor and they actually represent the paths (however, instead of a “.”, a “_” is used for the representation).

select a, b.c, c.[3].e

A projection looks as follows

JQDR> select a, b.c, c.[3].e from tinycoll
|a    |b_c  |c_[3]_e |
+-----+-----+--------+
|5    |10   |104      |
|5    |<>   |104      |
JQDR> 

In this case only three columns are returned as the projection projects only three paths.

Summary

This demo has provided a first glimpse at a JSON SQL language that supports querying JSON documents and provides the option of returning results as JSON objects or in table form.

The next blog will focus more on projection and the various relevant details of the projection semantics.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

 

SQL for JSON Rationalization Part 2: Is JSON SQL Superset of Relational SQL?

This is not really a precise question as it does not distinguish data model from query language – a distinction that it probably intended to make; so let’s rationalize both.

Table – Row

A table is defined in its basic form by columns and their types (domains), with columns having unique names within a table definition. The contents of a table are zero or more rows, whereby each row has to have for each column either a value (matching the column’s domain) or SQL NULL (indicating “unknown”); no deviation from this structure is possible, neither from the type system.

Collection – Document

In the JSON database world, JSON structures are called documents. Documents are in general grouped together in distinct groups and the grouping mechanism is a named collection. A collection can contain zero or more documents. Each document belongs to exactly one collection.

A collection in general does not impose a constraint on the document structure of its documents, so the whole range from all documents having the same structure to each document having a different structure is possible. That documents in a collection can have different structures is called “schema-free”, or more appropriately “implicit schema-per-document” instead of “schema-per-collection”.

Note: the structure of a document is the set of all paths from the root of the document to the leaves of the document. This definition allows a structural comparison: two documents having the same set of paths are structurally equivalent (or, are of the same schema if there was a schema definition).

Relational SQL

The term “Relational SQL” refers to SQL executing against tables. For example, “select * from abc” selects all rows from the table “abc”, which each row having values for each column of the table.

The query “select a, b from abc” would select all rows from the table “abc”, however, only the columns “a” and “b” within each row (projection).

JSON SQL

The term “JSON SQL” refers to SQL executing against collections. For example, in analogy to the Relational SQL query above, “select {*} from def” selects documents (indicated by the “{}”), and within each document all properties (indicated by “*”) from the collection “def”.

The query “select {a, b} from def” would select all objects from the collection “def”, however, only the properties “a” and “b”, in case these are present (more precisely, all paths starting with either “a” or “b” – this is one possible definition of projection in JSON SQL). If none of them is present in a document, then the result is the empty document. If only one property is present, the document only contains the one property.

For example, assume the collection “def” consisting of two documents:

{"a": 1, "b": 2, "c": 3}
{"d": 4, "e": 5, "f": 6}

The query “select {a, b} from def” would return

{"a": 1, "b": 2}
{}

JSON SQL with Table Output

What would the query “select a, b from def” return? This looks like a Relational SQL query, however, querying a collection of documents. The interpretation is to select the value of the property “a” and “b” from each document, however, not return a document, but a structure similar to a row with values only.

For example, “select a, b from def” applied to the above collection it would return

| a | b |
+---+---+
| 1 | 2 |
|   |   |

The first object would contribute the values “1” and “2”, the second object would not contribute values at all, so an empty row is returned (aka, a row where each column is empty and not SQL NULL as SQL NULL is not a valid value in the JSON structure definition).

Of course, the above example has scalar values for “a” and “b”; non-scalar values are supported as well, without restriction. It is therefore possible to have any JSON type in a column as value.

This small example shows that it is possible to not only return documents as a result of a JSON SQL query, but also tables (set of rows).

Collection with Schema Constraint

It is possible, at least conceptually, to enforce constraints on the documents of a collection with the interpretation that each document in the collection must satisfy the constraint. One possible constraint is that all document must have the exact same structure (schema) and only the values can be different.

For example, a constraint in context of collection “xyz” could be that all documents can only have top level properties with names “p” and “q” (paths of length 1), and the type of “p” must be a string, and the type of “q” must be an integer. Basically this constraint enforces a specific document schema on every document in the collection.

The collection “xyz” at a given point in time could have two documents:

{"p": "today", "q": 68}
{"p": "yesterday", "q": 71} 

Basically, this constraint enforces “flat” documents with scalar types, and this can be interpreted as the equivalent to rows of a table with two columns “p” and “q” of type string and integer.

Q.E.D.

With the appropriate constraint in place (aka, forcing “flat” documents of scalar property type), a collection can be interpreted as table and the documents as rows. Therefore it is possible to provide the equivalent of tables and the equivalent of Relational SQL in context of collections of documents supported by JSON SQL.

However, this is possible because of a constraint. Lifting that constraint allows documents to have any structure, and to have a representation beyond the table model. Therefore, JSON SQL on document collections is a superset of Relational SQL on tables.

Summary

In this blog it was shown that a constraint document environment can be interpreted as tables and JSON SQL can be used to accomplish the same functionality as Relational SQL. By lifting the constraint it follows that JSON SQL based on document collections is a superset of Relational SQL based on tables. The answer to the initial question is therefore “YES”.

The intriguing question now is, what will JSON SQL have to provide beyond Relational SQL in order to fully support the JSON document model? Future blogs will make an attempt to address this question in a structured and rationalized approach.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.