SQL for JSON Rationalization Part 18: Set Operators, Sorting, Grouping and Subqueries

Further operators like set operators, sorting, grouping and sub-queries apply to JSON SQL as well. This blog discusses some of the additional operators.

Union, Difference (Except), Intersection (Intersect)

The set operators union (“UNION”), difference (“EXCEPT”), and intersection (“INTERSECT”) are supported by Relational SQL and in order for those to apply the inputs to the set operators have to have the correct schema. In context of JSON SQL there are no schema requirements or restrictions and the set operators operate on sets of JSON documents and implement the usual semantics.

Set operators rely on JSON document equality and as discussed earlier equality is recursively defined on the properties of JSON documents. Two JSON documents are equal if they have the same set of paths with each pair of paths (one from each document) leading to the same scalar values.

An example query is the following, combining all parts available in the US as well as Europe.

select {*}
from us_parts
union
select {*}
from eur_parts

Sorting

Sorting of result sets can be supported by JSON SQL as well. Paths can be defined in the order by part of a JSON SQL query and sorting takes place on the values the paths are referring to (“sorting paths”). In context of JSON documents that do not have to comply to a fixed schema a special interpretation is necessary for a few cases:

  • A property that is absent (aka, the path specified in the sorting section of the query does not exist in a document) cannot be sorted on. One possible semantics is that the absence of a value is the largest or lowest value possible and the document is sorted accordingly.

    A more recent SQL standard introduced the clause “NULLS FIRST” and “NULLS LAST” in order to define where SQL NULL is placed in a sorted result. The same could be followed here with e.g. “ABSENT FIRST” or “ABSENT LAST”.
  • Another case is type heterogeneity, meaning, the same path in different documents refers to different JSON types. In this case a possible strategy is to sort within each type, and then order the types based on a predefined order, like, null, true, false, string, number, object, array (arbitrary, but fixed order).

    Following the same idea of “NULLS FIRST” and “NULLS LAST”, a clause could be added the defines the type order, like “TYPE ORDER JSON_NULL, JSON_TRUE, JSON_FALSE, JSON_STRING, JSON_NUMBER, JSON_OBJECT, JSON_ARRAY”.

Unless the sorting paths of all documents in a result set comply to the same schema a total order cannot solely established based on values, but required additional rules like those outlined above in the bullet list.

The following example sorts by shipper rating.

select {*}
from shipper sh
order by sh.rating desc 
         absent last 
         type order json_null, json_true, json_false, 
                    json_string, json_number, 
                    json_object, json_array

Grouping and Having

Grouping of result documents can be implemented in JSON SQL as in Relational SQL with the usual aggregation functions. The having construct can be applied as well to select from the groups. Grouping is defined by paths into the JSON documents and the same discussion wrt. missing values or type heterogeneity applies as in the sorting discussion.

The following lists all states and shipper rating averages where the shippers have an average rating about a certain threshold.

select {sh.state, avg(sh.rating)}
from shipper sh
group by sh.state
having avg(sh.rating) > 5

Subqueries

JSON SQL can support sub-queries like Relational SQL does. In principle, a JSON SQL query can return results as object as well as relations. In context of a sub-query results are only returned in form of JSON documents.

The following example lists shippers in states that has been shipped to in the past.

select {*}
from shipper
where shipper.state in
  (select s.state
   from states s
   where s.shipped_to = true)

Summary

This brief discussion selected a few additional Relational SQL operators and has shown how they can be interpreted in context of JSON SQL. A this point I am confident that the complete Relational SQL semantics can be extended for the JSON types without restriction and with possible semantic interpretation extension due to the possible absence of a fixed schema.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

 

SQL for JSON Rationalization Part 17: Cartesian Product with Restriction (Join) (Again!)

There is a lot more to be said about joins in context of JSON SQL beyond the introduction in the previous blog.

“Join Homogeneous” Schema

The previous blog’s sample data set was homogeneous in the sense that all paths used in join criteria had a value in all documents. There was never the case that a path did not have a value. This is analogous to Relational SQL where columns used in joins always have values by virtue of the existence of a schema.

Let’s explore “join heterogeneity” in this blog. As usual, the sample data set is introduced first.

Sample Data Set

select {*} from foo

results in

{"a":{"b":5},"n":null,"x":{"y":"foobar"}}
{"a":{"b":10},"n":false}

and

select {*} from bar

results in

{"a":{"b":5},"n":true,"x":{"y":"foobar"}}
{"a":{"b":11},"n":null,"x":"missing"}

Homogeneous Join

The following join is homogeneous as the paths involved in the join criteria all have a value.

select {*} 
from  foo as f, 
      bar as b 
where f.a = b.a

Results in

{"b":{"a":{"b":5},"n":true,"x":{"y":"foobar"}},
 "f":{"a":{"b":5},"n":null,"x":{"y":"foobar"}}}

Null vs. Absent Value

In the JSON standard JSON null is a value. Compared to Relational SQL, JSON null does not express “unknown”. The equivalent to Relational SQL NULL is the absence of the value in JSON SQL. Therefore, a join where the paths involved in a join criteria have the value JSON null are homogeneous joins.

select {*} 
from  foo as f, 
      bar as b 
where f.n = b.n

results in

{"b":{"a":{"b":11},"n":null,"x":"missing"},
 "f":{"a":{"b":5},"n":null,"x":{"y":"foobar"}}}

Heterogeneous Join

A heterogeneous join in context of JSON SQL has paths in the join criteria that do not exist in at least one document, aka, do not refer to values in this case.

For example, the path x.y does not refer to a value in all documents of the example data set.

The semantics is that if a document does not have a value at the path of the join criteria the document does not participate in the Cartesian product, and therefore does not provide a document to the result set.

select {*} 
from  foo as f, 
      bar as b 
where f.x.y = b.x.y

results in

{"b":{"a":{"b":5},"n":true,"x":{"y":"foobar"}},
 "f":{"a":{"b":5},"n":null,"x":{"y":"foobar"}}}

Check for Missing Values

JSON SQL provides a predicate that supports checking the presence (or absence) of values. This predicate can be used to check if a join is going to be a homogeneous join or a heterogeneous join.

select {*} 
from  foo 
where not exists_path x.y

results in

{"a":{"b":10},"n":false}

and

select {*} 
from  bar 
where not exists_path x.y

results in

{"a":{"b":11},"n":null,"x":"missing"}

These queries show that the previous query is a heterogeneous join as not all documents contain the join paths.

In the absence of schema support for JSON this allows to check for homogeneity in context of joins, like a dynamic schema check for a very specific purpose. During software development it can be determined if it is important to have a homogeneous join or if a heterogeneous join is sufficient. Depending on the requirement and outcome of the query checking for path existing appropriate error handling can take place.

Summary

JSON SQL supports homogeneous as well as heterogeneous joins without any extra syntax or special execution semantics. Furthermore, with the predicate for checking existence the developer is given a tool to determine if a join is going to be homogeneous or heterogeneous.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 16: Cartesian Product with Restriction (Join)

Restrictions can be added to a Cartesian Product and this is briefly discussed in this blog. It demonstrates the power of joins in context of JSON documents.

Example Data Set

As always, the sample data sets that are being used for queries in this blog are introduced first.

select {*} from jer

results in

{"a":1,"b":20,"c":true,"d":{"x":"y"}}
{"a":2,"b":21,"c":true,"d":{"x":[null,5]}}

and

select {*} from tom

results in

{"a":3,"b":20,"c":false,"d":{"x":"y"}}
{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}

This data set is used in the following to introduce restrictions in context of a Cartesian Product.

Join and Join Criteria

The following demonstrates a join where the join criteria (restriction) is based on a scalar type.

select {*} 
from   jer as j, tom as t 
where  j.b = t.b

results in

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

{"j":{"a":2,"b":21,"c":true,"d":{"x":[null,5]}},
 "t":{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}}

Remember, that the result documents from each of the collections are disambiguated by adding the root property “j” and “t” (aka, the correlation specifications).

A join can be empty if the join criteria do not derive to a result, as shown in the following.

select {*} 
from   jer as j, tom as t 
where  j.a = t.a

does not return a result.

Projection can be applied as well.

select {t.b} 
from   jer as j, tom as t 
where  j.b = t.b

results in

{"t":{"b":20}}
{"t":{"b":21}}

Using an AS clause in the projection allows to reshape the result.

select {t.b as tb} 
from   jer as j, tom as t 
where  j.b = t.b

results in

{"tb":20}
{"tb":21}

Join criteria can be defined not only on top level scalar properties, but on any JSON structure on any level. The following two queries illustrate this.

select {*} 
from   jer as j, tom as t 
where  j.d.x.[1] = t.d.x.q

results in

{"j":{"a":2,"b":21,"c":true,"d":{"x":[null,5]}},
 "t":{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}}

and

select {*} 
from   jer as j, tom as t 
where  j.d = t.d

results in

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

Of course, equality is not the only possible operator for join criteria.

select {*} 
from   jer as j, tom as t 
where  j.a < t.a

results in

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}}

{"j":{"a":2,"b":21,"c":true,"d":{"x":[null,5]}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

{"j":{"a":2,"b":21,"c":true,"d":{"x":[null,5]}},
 "t":{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}}

and so does

select {*} 
from   jer as j, tom as t 
where  j.a <> t.a

Cartesian Product with Restriction

Cartesian products can be restricted with non-join criteria.

select {*} 
from   jer as j, tom as t 
where  j.c = true 
       or t.c = false

results in

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}}

{"j":{"a":2,"b":21,"c":true,"d":{"x":[null,5]}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

{"j":{"a":2,"b":21,"c":true,"d":{"x":[null,5]}},
 "t":{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}}

Join with Join and Non-Join Criteria

And a mix of join and non-join criteria is possible as well.

select {*} 
from   jer as j, tom as t 
where  j.d = t.d 
       and j.b = t.b 
       and (j.c = true or t.c = false)

results in

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

Summary

Joins are a powerful feature of JSON SQL as demonstrated in this blog as it supports the combination of documents in different collections without having to foresee their combination when deciding on the document structures. Joins combine the power of JSON documents with the power of value-based correlation of documents.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 15: Cartesian Product and Projection

In part 14 of this blog series Cartesian Product queries were discussed that did have an Asterisk projection; this blog discusses specific paths as projection (non-Asterisk).

Example Data Set

As always, the sample data sets that are being used for queries in this blog are introduced first.

select {*} from ying

results in

{"a":3,"c":20}
{"a":4,"c":21}

and

select {*} from yang

results in

{"a":1,"b":10}
{"a":2,"b":11}

Projection

To recap, JSON SQL supports JSON projection as well as relational projection. JSON projection is specified by enclosing paths within a set of curly brackets: {}. This will cause the query result represented as JSON objects.

For example, the following query returns JSON objects.

select {a, b} from yang

results in

{"a":1,"b":10}
{"a":2,"b":11}

JSON SQL returns relational results when the set of curly brackets is omitted; the following query returns the result as table.

select a, b from yang

results in

|a                        |b                        |
+-------------------------+-------------------------+
|1                        |10                       |
|2                        |11                       |

Projection without AS in Joins

The following is a projection of a join resulting in JSON objects.

select {yi.a, ya.b} from ying as yi, yang as ya

results in

{"ya":{"b":10},"yi":{"a":3}}
{"ya":{"b":11},"yi":{"a":3}}
{"ya":{"b":10},"yi":{"a":4}}
{"ya":{"b":11},"yi":{"a":4}}

The same query with results represented as relation is specified as follows.

select yi.a, ya.b from ying as yi, yang as ya

results in

|yi_a                     |ya_b                     |
+-------------------------+-------------------------+
|3                        |10                       |
|3                        |11                       |
|4                        |10                       |
|4                        |11                       |

Observe that the results include the table correlation specifiers “yi” or “ya”. This is necessary since different collections might have documents with the same paths. The following query highlights this case.

select {yi.a, ya.a} from ying as yi, yang as ya

results in

{"ya":{"a":1},"yi":{"a":3}}
{"ya":{"a":2},"yi":{"a":3}}
{"ya":{"a":1},"yi":{"a":4}}
{"ya":{"a":2},"yi":{"a":4}}

This automatic result qualification using correlation specifications ensures that path duplicates are automatically resolved in the results.

Projection with AS in Joins

In many cases the automatic duplicate resolution is sufficient for clients. However, in some cases this is not desired. In those cases the AS clause allows the placement of result values into any place of JSON documents using the AS clause. In the relational result case the columns can be named as desired.

select {yi.a as b, ya.a as c} from ying as yi, yang as ya

results in

{"b":3,"c":1}
{"b":3,"c":2}
{"b":4,"c":1}
{"b":4,"c":2}

The above shows a simple renaming of the paths.

select {yi.a as x.b, ya.a as y.[0]} from ying as yi, yang as ya

results in

{"x":{"b":3},"y":[1]}
{"x":{"b":3},"y":[2]}
{"x":{"b":4},"y":[1]}
{"x":{"b":4},"y":[2]}

This query shows a more complex result object creation and goes beyond simple renaming of paths.

The following query shows how specific column names are specified.

select yi.a as x, ya.a as y from ying as yi, yang as ya

results in

|x                        |y                        |
+-------------------------+-------------------------+
|3                        |1                        |
|3                        |2                        |
|4                        |1                        |
|4                        |2                        |

Summary

In summary, defining projection in context of SQL JSON joins is straightforward and supports flexible renaming of columns in context of relational results as well as expressive result value positioning as paths in JSON object results.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 6: Restriction – General Discussion

After discussing projection, selection is up next in the blog series on SQL for JSON. This first blog on selection focuses on the scalar JSON types Number and String.

Demo Data

As usual, we start with demo data. The collection for this blog is called “selcoll” (for SelectionCollection) and contains the following documents:

select {*} from selcoll

returns

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}
{"a":{"b":"25"},"c":["foo1","foobar","bar"],"d":{"e":"foo"}}
{"a":{"b":25},"c":["foo2","foo2bar2","ba\"r"],"d":{"e":"foo2"}}

Selection based on Literals

Selection is following the regular Relational SQL syntax and is straightforward. For this discussion only single predicates are shown, not (complex) Boolean expressions of predicates. Boolean expression of predicates follow the usual semantics and do not require a lot of discussion.

select {*} from selcoll where a.b = 25

This JSON SQL statement selects all documents from selcoll that have a property “a” and a property “b” within a sub-document of “a” with the value of 25.

The result is

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}
{"a":{"b":25},"c":["foo2","foo2bar2","ba\"r"],"d":{"e":"foo2"}}

The following selection has the same semantics and returns the same result:

select {*} from selcoll where 25 = a.b

A selection based on a String literal follows the same syntax:

select {*} from selcoll where c.[1] = 'foobar'

This returns

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}
{"a":{"b":"25"},"c":["foo1","foobar","bar"],"d":{"e":"foo"}}

And the following selection returns the same result:

select {*} from selcoll where 'foobar' = c.[1]

In this context a note is in order. JSON uses double quotes as string delimiter, not single quotes, as SQL does. In order to stay as near as possible to Relational SQL, single quotes are used and transformed into double quotes by the underlying implementation.

Selection based on Value Comparison

It is possible to relate two different values within a document as well (aka, not a self-join that would related values of different documents – this will be discussed in a later blog).

select d.e as de, c.[0] as c0 from selcoll where d.e = c.[0]

This query selects all documents that have the same value in d.e and c.[0]. As added benefit the query projects to those two values as well.

The result is

|de                       |c0                       |
+-------------------------+-------------------------+
|"foo"                    |"foo"                    |
|"foo2"                   |"foo2"                   |

Any path can be related to any other path without restriction.

While in this blog only numbers and strings are discussed, the above discussed types of restrictions will work for all JSON data types in general, including true, false, null, objects and arrays (discussed in subsequent blogs).

Operations

The usual operators are defined: <, >, <>, =, >=, and <=. The semantics of these is defined for Number and String (Relational SQL semantics is taken). For the other JSON types they will have to be defined as the other JSON types do not have a corresponding Relational SQL domain.

Beyond these operators more “interesting” operations are required. For example

select {*} from selcoll where c contains 'foobar'

whereby “c” refers to a JSON array (and possibly a JSON object). This predicate would be true if there is an element in “c” that is of type String and the value of that element is “foobar”. There is a whole set of interesting operations that will be discussed at some point later as well.

Semantics

As implicitly demonstrated above, a JSON document is only in the result set if (a) the path to the value as specified in the JSON SQL query is present and (b) the value is the value as indicated in the selection clause in JSON SQL.

If the path does not exist or the value does not have a matching value, no result is returned for that document (and not the empty document itself).

There is no implicit type transformation implemented. This means that a Number literal only matches number values, and a String literal only matches string values.

Syntax Twists

Syntax has always a twist, especially if different languages are combined. In this case one of the twists is the single quote. A single quote within a string is represented as two single quotes in Relational SQL. JSON SQL does not have that requirement since strings are delimited by double quotes in JSON and a single quote is treated as regular character. The reverse situation exists also: double quotes have to be escaped within a string in JSON, but not in Relational SQL.

The query (double quote, not escaped in JSON SQL)

select {*} from selcoll where c.[2] = 'ba"r'

returns

{"a":{"b":25},"c":["foo2","foo2bar2","ba\"r"],"d":{"e":"foo2"}}

And the query (two single quotes, escaped in JSON SQL)

select {*} from selcoll where c.[2] = 'ba''r'

returns

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}

Summary

Restriction (or selection) is almost straightforward for the types Number and String in context of JSON SQL. The only twist is the way Relational SQL and JSON SQL differ in denoting String literals as well as encode special characters.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 5: Projection – Specific Functionality

The last blog introduced SQL JSON projection and this blog will discuss some of its finer points.

Demo Data

Here are the JSON documents from the collection “tinycoll” used throughout this blog:

select {*} from tinycoll

returns two JSON documents:

{"a": 5,
 "b": {"c": 10,"d": 11},
 "c": [101, 102, {"d": 103}, {"e": 104}]}

{"a": 5,
 "b2": [10, 11],
 "c": [101, 102, {"d": 103}, {"e": 104}]}

AS Clause

In Relational SQL it is possible to rename columns. The AS clause is the means to do this and it contains an alternative column name. Example:

select a as abc from tinycoll

The result contains a column called “abc” instead of “a” and this is standard Relational SQL semantics.

|abc                      |
+-------------------------+
|5                        |
|5                        |

What does an AS clause mean in context of JSON SQL? In context of JSON SQL an AS clause specifies a path. Example:

select {a as x.y} from tinycoll

The result contains documents with paths “x.y” that contain the value of the corresponding “a” in the original document (if “a” is present).

{"x":{"y":5}}
{"x":{"y":5}}

Fundamentally, it means that the value the original path “a” pointed to is now at a new path “x.y” and that can be seen as relocation that only takes place in the result document. Any valid path is possible in the AS clause.

So far the AS clause supports renaming as well as relocation. Relocation is orthogonal and does not affect the original document. For example, the following relocations are valid:

select {a as b, b as a} from tinycoll

Basically, the values are exchanged between the two paths “a” and “b” (which can be more complex paths, of course).

{"a":{"c":10,"d":11},"b":5}
{"b":5}

All AS clauses are applied independently of each other, not in sequence (and therefore “a” and “b” do not contain the same value because of this projection specification).

A final situation is overwriting, meaning, the path in the AS clause can be that of an existing path in a JSON document and that will overwrite the value in the result document. For example:

select {a as c.[0]} from tinycoll

The existing value of “c.[0]” is overwritten and contains the value of “a” in the result document if “a” exists in the original document.

{"c":[5]}
{"c":[5]}

There are a few language constraints that are checked for. These are

  • Path Subsumption. A path in an AS clause must not be a subpath in any other path; otherwise one AS clause might conflict with another one. An example for a violation is: “select {a as c.[2].d, b as c.[2]} from tinycoll”. This is analogous to Relational SQL not allowing the use of the same column name in two different AS clauses.
  • Asterisk Query. An asterisk query cannot have an AS clause; if any change is necessary by means of an AS clause, the paths have to be listed explicitly.
  • Relational Output Path. The path in an AS clause for relation output must be a single value (path of length one) in order to comply to the Relational SQL semantics/model.

Value Non-Existence

The AS clause might create a path in a result document that does not exist in the original document. For example:

select {a as x.[2]} from tinycoll

In this example, the original document does not have an array named “x”. However, the result document is going to have one if “a” is present. The path sets the value of “a” to the third array element, however, the first and second element do not have a value as those elements do not exist. The result of the query is

{"x":["<>","<>",5]}
{"x":["<>","<>",5]}

The JSON standard does not have a notation for an absent value, however, it is needed in order to describe accurately that values are undefined. Therefore, the symbol “<>” is introduced of type String in order to (a) denote that a value is undefined and to (b) represent it as a known data type so that JSON libraries can process it.

“<>” is randomly defined; it can be changed to another symbol as necessary. JSON null cannot be used as JSON null is a valid constant (aka, explicit JSON value) and in contrast to SQL null does not denote “unknown”. The use of JSON null might suggest that there is the value of JSON null, when in reality there is no value at all. Any trailing “<>” are removed and not present in the output JSON documents.

Array Element Replacement

It is possible to replace array elements selectively, for example:

select {a as c.[0], b as c.[1], c.[2]} from tinycoll

will result in

{"c":[5,{"c":10,"d":11},{"d":103}]}
{"c":[5,"<>",{"d":103}]}

A shortcut syntax like c.[2..9] that refers to the 3rd until 10th elements inclusive is not supported at this point, but could be for convenience. If implemented at some point in time, then this section will be changed.

Likewise for a shortcut syntax like c.[..9], c.[2..] or c.[..] indicating all elements including the 10th, all starting with the 3rd, or all element respectively.

Additional Items

The “select distinct” clause is not specifically discussed as it has the intended semantics based on the JSON document equality definition.

An interesting case on projection is the mixed case, aka, some projection is relational, some other asks for the JSON form. For example,

select a, b, {c, d} from tinycoll

returns relational output, but with certain columns containing JSON objects that can be freely composed from one or more paths. This might be convenient from a final output viewpoint for the client, but would not contribute in major ways to a JSON SQL language definition. Therefore, it is not implemented as of now (and in case this decision changes, this section will be updated in the future).

Summary

Projection in context of JSON SQL is not all that straightforward compared to the Relational SQL semantics. This blog highlighted the most important finer points like the AS clause and array processing and outlined some of the additional possible extensions to an implementation.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 4: Projection – General Functionality

After the demo in the last blog (Part 3) it is time to discuss some of the assumptions and the projection functionality in more detail – here and in the next blog.

Assumptions: Array Start Index, JSON Literals and JSON Value Equality

The JSON standard does not define the starting index of the first array element. The assumption made here is that the first index is 0 (zero).

The JSON standard requires the literals “null”, “true” and “false” to be lower case. However, the assumption made here is that all lower as well as upper case combinations work, e.g., “True”, for convenience.

Another aspect the JSON standard does not define is equality on JSON values. There are many ways to define when two JSON values are equal. Here equality is defined on the string representation of the JSON values that contain no white space and where the property names in JSON objects are sorted alphabetically.

Definitions: Full and Partial Path

A full path is a sequence of property names as well as array indexes from the root of a JSON document all the way to one of its leaves. The elements of a path are separated by “.”. For example, “c.[3].e” is a full path from the root “c” to the leaf “e” in one of the demo documents of the previous blog. A path must start with a property name and cannot start with an array index. A path cannot be empty and the minimum path consists of a single property name.

Using “.” as separator is a random choice, but made by many systems. Having array indexes enclosed in “[” and “]” is customary also. Denoting an array index as separate path element (aka, enclosed in “.”) is also a convenient choice.

Given a JSON object, a full path might exist within it or not. Given a JSON object and a path, using the path as an access structure identifies a value only if the full path exists in the JSON document. If the path does not exist within the JSON document then no value is identified; especially not the JSON literal “null”.

A partial path is a sequence of property names and array indexes starting at the root, but not necessarily ending at a leaf, but at an intermediary property or array index. This supports “reaching into” a JSON document and identifying a JSON value that is between the root and a leaf.

Like in case of full paths, given a JSON object, a partial path might or might not exist within it. A partial path only identifies a JSON value if the partial path exists within a JSON object. In this case it identifies a composite JSON value.

If a JSON document has only scalar properties, then the root properties are the leaf properties at the same time. Paths in this context are full paths and partial paths cannot exist.

Projection

Unlike in the relational model, in context of the JSON model the result of a query can be returned as a relational table, or as a set of JSON documents. The choice is made by the query author.

The projection in a select statement contains one or more (full or partial) paths. If the paths are enclosed by a “{“ and “}” then JSON documents are returned, otherwise a table  (the asterisk projection is discussed below).

For example, the query from the previous blog

select a, b.c, d.[3].e from tinycoll

returns a table with three columns.

Semantically, each path in the projection will be a separate column. Each document from the collection “tinycoll” is taken and a corresponding row is added to the table. For each path of the projection that is found in the document the value is added to the row. If a path does not exist, no value is added in the column corresponding to the path. Therefore, a row can have values in every column, in some columns, or in no column, depending if the paths exist in the document.

As in relational SQL, the order of the paths matters as the corresponding columns will be created in that order.

The column names are created from the paths by replacing “.” in the path representation with “_” as many relational systems do not support “.” as column names.

The query

select {a, b.c, d.[3].e} from tinycoll

returns a set of JSON documents.

Semantically, each document from the collection “tinycoll” is taken and an empty result document is created for it. Each of the paths from the projection are followed within the document from the collection. If a value is found, the path with the corresponding value is added to the result document. It is possible that the document from the collection contains all, some, or none of the paths from the projection. Therefore, the result document might contain all, some, or none of the paths (empty document).

The order of the paths in the projection does not matter as JSON documents are created and order of properties / paths is not defined for JSON objects.

As a note, according to the construction principle of the result JSON documents, the paths in the projection of the select statement and the paths in the result JSON documents are exactly the same (if they exist). No translation is necessary from the viewpoint of the client between the paths in the query and the paths in the result documents.

Asterisk Projection

The asterisk projection is supported. The query

select {*} from tinycoll

returns all documents stored in the collection “tinycoll” as they are without any modification.

The query

select * from tinycoll

Returns a table that has any many columns as there are full and partial paths into all documents of the collection “tinycoll”.

Semantically, each document from the collection “tinycoll” is taken and a row is created for it. For each full as well as partial path in the document the value is retrieved and put into the corresponding column of the row. There is a column for each possible path and the set of columns can be predetermined or dynamically added to the result table as needed. As before, the column names are the path with the “.” replaced by “_”.

Summary

This was a first closer look into the details of projection in context of JSON SQL and the next blog will continue the project discussion. The key take away is that JSON SQL can return JSON documents as well as tables based on a well-defined execution semantics centered around JSON paths.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.