The last blog introduced SQL JSON projection and this blog will discuss some of its finer points.
Demo Data
Here are the JSON documents from the collection “tinycoll” used throughout this blog:
select {*} from tinycoll
returns two JSON documents:
{"a": 5, "b": {"c": 10,"d": 11}, "c": [101, 102, {"d": 103}, {"e": 104}]} {"a": 5, "b2": [10, 11], "c": [101, 102, {"d": 103}, {"e": 104}]}
AS Clause
In Relational SQL it is possible to rename columns. The AS clause is the means to do this and it contains an alternative column name. Example:
select a as abc from tinycoll
The result contains a column called “abc” instead of “a” and this is standard Relational SQL semantics.
|abc | +-------------------------+ |5 | |5 |
What does an AS clause mean in context of JSON SQL? In context of JSON SQL an AS clause specifies a path. Example:
select {a as x.y} from tinycoll
The result contains documents with paths “x.y” that contain the value of the corresponding “a” in the original document (if “a” is present).
{"x":{"y":5}} {"x":{"y":5}}
Fundamentally, it means that the value the original path “a” pointed to is now at a new path “x.y” and that can be seen as relocation that only takes place in the result document. Any valid path is possible in the AS clause.
So far the AS clause supports renaming as well as relocation. Relocation is orthogonal and does not affect the original document. For example, the following relocations are valid:
select {a as b, b as a} from tinycoll
Basically, the values are exchanged between the two paths “a” and “b” (which can be more complex paths, of course).
{"a":{"c":10,"d":11},"b":5} {"b":5}
All AS clauses are applied independently of each other, not in sequence (and therefore “a” and “b” do not contain the same value because of this projection specification).
A final situation is overwriting, meaning, the path in the AS clause can be that of an existing path in a JSON document and that will overwrite the value in the result document. For example:
select {a as c.[0]} from tinycoll
The existing value of “c.[0]” is overwritten and contains the value of “a” in the result document if “a” exists in the original document.
{"c":[5]} {"c":[5]}
There are a few language constraints that are checked for. These are
- Path Subsumption. A path in an AS clause must not be a subpath in any other path; otherwise one AS clause might conflict with another one. An example for a violation is: “select {a as c.[2].d, b as c.[2]} from tinycoll”. This is analogous to Relational SQL not allowing the use of the same column name in two different AS clauses.
- Asterisk Query. An asterisk query cannot have an AS clause; if any change is necessary by means of an AS clause, the paths have to be listed explicitly.
- Relational Output Path. The path in an AS clause for relation output must be a single value (path of length one) in order to comply to the Relational SQL semantics/model.
Value Non-Existence
The AS clause might create a path in a result document that does not exist in the original document. For example:
select {a as x.[2]} from tinycoll
In this example, the original document does not have an array named “x”. However, the result document is going to have one if “a” is present. The path sets the value of “a” to the third array element, however, the first and second element do not have a value as those elements do not exist. The result of the query is
{"x":["<>","<>",5]} {"x":["<>","<>",5]}
The JSON standard does not have a notation for an absent value, however, it is needed in order to describe accurately that values are undefined. Therefore, the symbol “<>” is introduced of type String in order to (a) denote that a value is undefined and to (b) represent it as a known data type so that JSON libraries can process it.
“<>” is randomly defined; it can be changed to another symbol as necessary. JSON null cannot be used as JSON null is a valid constant (aka, explicit JSON value) and in contrast to SQL null does not denote “unknown”. The use of JSON null might suggest that there is the value of JSON null, when in reality there is no value at all. Any trailing “<>” are removed and not present in the output JSON documents.
Array Element Replacement
It is possible to replace array elements selectively, for example:
select {a as c.[0], b as c.[1], c.[2]} from tinycoll
will result in
{"c":[5,{"c":10,"d":11},{"d":103}]} {"c":[5,"<>",{"d":103}]}
A shortcut syntax like c.[2..9] that refers to the 3rd until 10th elements inclusive is not supported at this point, but could be for convenience. If implemented at some point in time, then this section will be changed.
Likewise for a shortcut syntax like c.[..9], c.[2..] or c.[..] indicating all elements including the 10th, all starting with the 3rd, or all element respectively.
Additional Items
The “select distinct” clause is not specifically discussed as it has the intended semantics based on the JSON document equality definition.
An interesting case on projection is the mixed case, aka, some projection is relational, some other asks for the JSON form. For example,
select a, b, {c, d} from tinycoll
returns relational output, but with certain columns containing JSON objects that can be freely composed from one or more paths. This might be convenient from a final output viewpoint for the client, but would not contribute in major ways to a JSON SQL language definition. Therefore, it is not implemented as of now (and in case this decision changes, this section will be updated in the future).
Summary
Projection in context of JSON SQL is not all that straightforward compared to the Relational SQL semantics. This blog highlighted the most important finer points like the AS clause and array processing and outlined some of the additional possible extensions to an implementation.
Go [ JSON | Relational ] SQL!
Disclaimer
The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.