SQL for JSON Rationalization Part 18: Set Operators, Sorting, Grouping and Subqueries

Further operators like set operators, sorting, grouping and sub-queries apply to JSON SQL as well. This blog discusses some of the additional operators.

Union, Difference (Except), Intersection (Intersect)

The set operators union (“UNION”), difference (“EXCEPT”), and intersection (“INTERSECT”) are supported by Relational SQL and in order for those to apply the inputs to the set operators have to have the correct schema. In context of JSON SQL there are no schema requirements or restrictions and the set operators operate on sets of JSON documents and implement the usual semantics.

Set operators rely on JSON document equality and as discussed earlier equality is recursively defined on the properties of JSON documents. Two JSON documents are equal if they have the same set of paths with each pair of paths (one from each document) leading to the same scalar values.

An example query is the following, combining all parts available in the US as well as Europe.

select {*}
from us_parts
union
select {*}
from eur_parts

Sorting

Sorting of result sets can be supported by JSON SQL as well. Paths can be defined in the order by part of a JSON SQL query and sorting takes place on the values the paths are referring to (“sorting paths”). In context of JSON documents that do not have to comply to a fixed schema a special interpretation is necessary for a few cases:

  • A property that is absent (aka, the path specified in the sorting section of the query does not exist in a document) cannot be sorted on. One possible semantics is that the absence of a value is the largest or lowest value possible and the document is sorted accordingly.

    A more recent SQL standard introduced the clause “NULLS FIRST” and “NULLS LAST” in order to define where SQL NULL is placed in a sorted result. The same could be followed here with e.g. “ABSENT FIRST” or “ABSENT LAST”.
  • Another case is type heterogeneity, meaning, the same path in different documents refers to different JSON types. In this case a possible strategy is to sort within each type, and then order the types based on a predefined order, like, null, true, false, string, number, object, array (arbitrary, but fixed order).

    Following the same idea of “NULLS FIRST” and “NULLS LAST”, a clause could be added the defines the type order, like “TYPE ORDER JSON_NULL, JSON_TRUE, JSON_FALSE, JSON_STRING, JSON_NUMBER, JSON_OBJECT, JSON_ARRAY”.

Unless the sorting paths of all documents in a result set comply to the same schema a total order cannot solely established based on values, but required additional rules like those outlined above in the bullet list.

The following example sorts by shipper rating.

select {*}
from shipper sh
order by sh.rating desc 
         absent last 
         type order json_null, json_true, json_false, 
                    json_string, json_number, 
                    json_object, json_array

Grouping and Having

Grouping of result documents can be implemented in JSON SQL as in Relational SQL with the usual aggregation functions. The having construct can be applied as well to select from the groups. Grouping is defined by paths into the JSON documents and the same discussion wrt. missing values or type heterogeneity applies as in the sorting discussion.

The following lists all states and shipper rating averages where the shippers have an average rating about a certain threshold.

select {sh.state, avg(sh.rating)}
from shipper sh
group by sh.state
having avg(sh.rating) > 5

Subqueries

JSON SQL can support sub-queries like Relational SQL does. In principle, a JSON SQL query can return results as object as well as relations. In context of a sub-query results are only returned in form of JSON documents.

The following example lists shippers in states that has been shipped to in the past.

select {*}
from shipper
where shipper.state in
  (select s.state
   from states s
   where s.shipped_to = true)

Summary

This brief discussion selected a few additional Relational SQL operators and has shown how they can be interpreted in context of JSON SQL. A this point I am confident that the complete Relational SQL semantics can be extended for the JSON types without restriction and with possible semantic interpretation extension due to the possible absence of a fixed schema.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

 

Advertisements

SQL for JSON Rationalization Part 17: Cartesian Product with Restriction (Join) (Again!)

There is a lot more to be said about joins in context of JSON SQL beyond the introduction in the previous blog.

“Join Homogeneous” Schema

The previous blog’s sample data set was homogeneous in the sense that all paths used in join criteria had a value in all documents. There was never the case that a path did not have a value. This is analogous to Relational SQL where columns used in joins always have values by virtue of the existence of a schema.

Let’s explore “join heterogeneity” in this blog. As usual, the sample data set is introduced first.

Sample Data Set

select {*} from foo

results in

{"a":{"b":5},"n":null,"x":{"y":"foobar"}}
{"a":{"b":10},"n":false}

and

select {*} from bar

results in

{"a":{"b":5},"n":true,"x":{"y":"foobar"}}
{"a":{"b":11},"n":null,"x":"missing"}

Homogeneous Join

The following join is homogeneous as the paths involved in the join criteria all have a value.

select {*} 
from  foo as f, 
      bar as b 
where f.a = b.a

Results in

{"b":{"a":{"b":5},"n":true,"x":{"y":"foobar"}},
 "f":{"a":{"b":5},"n":null,"x":{"y":"foobar"}}}

Null vs. Absent Value

In the JSON standard JSON null is a value. Compared to Relational SQL, JSON null does not express “unknown”. The equivalent to Relational SQL NULL is the absence of the value in JSON SQL. Therefore, a join where the paths involved in a join criteria have the value JSON null are homogeneous joins.

select {*} 
from  foo as f, 
      bar as b 
where f.n = b.n

results in

{"b":{"a":{"b":11},"n":null,"x":"missing"},
 "f":{"a":{"b":5},"n":null,"x":{"y":"foobar"}}}

Heterogeneous Join

A heterogeneous join in context of JSON SQL has paths in the join criteria that do not exist in at least one document, aka, do not refer to values in this case.

For example, the path x.y does not refer to a value in all documents of the example data set.

The semantics is that if a document does not have a value at the path of the join criteria the document does not participate in the Cartesian product, and therefore does not provide a document to the result set.

select {*} 
from  foo as f, 
      bar as b 
where f.x.y = b.x.y

results in

{"b":{"a":{"b":5},"n":true,"x":{"y":"foobar"}},
 "f":{"a":{"b":5},"n":null,"x":{"y":"foobar"}}}

Check for Missing Values

JSON SQL provides a predicate that supports checking the presence (or absence) of values. This predicate can be used to check if a join is going to be a homogeneous join or a heterogeneous join.

select {*} 
from  foo 
where not exists_path x.y

results in

{"a":{"b":10},"n":false}

and

select {*} 
from  bar 
where not exists_path x.y

results in

{"a":{"b":11},"n":null,"x":"missing"}

These queries show that the previous query is a heterogeneous join as not all documents contain the join paths.

In the absence of schema support for JSON this allows to check for homogeneity in context of joins, like a dynamic schema check for a very specific purpose. During software development it can be determined if it is important to have a homogeneous join or if a heterogeneous join is sufficient. Depending on the requirement and outcome of the query checking for path existing appropriate error handling can take place.

Summary

JSON SQL supports homogeneous as well as heterogeneous joins without any extra syntax or special execution semantics. Furthermore, with the predicate for checking existence the developer is given a tool to determine if a join is going to be homogeneous or heterogeneous.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.