There is a lot more to be said about joins in context of JSON SQL beyond the introduction in the previous blog.
“Join Homogeneous” Schema
The previous blog’s sample data set was homogeneous in the sense that all paths used in join criteria had a value in all documents. There was never the case that a path did not have a value. This is analogous to Relational SQL where columns used in joins always have values by virtue of the existence of a schema.
Let’s explore “join heterogeneity” in this blog. As usual, the sample data set is introduced first.
Sample Data Set
select {*} from foo
results in
{"a":{"b":5},"n":null,"x":{"y":"foobar"}} {"a":{"b":10},"n":false}
and
select {*} from bar
results in
{"a":{"b":5},"n":true,"x":{"y":"foobar"}} {"a":{"b":11},"n":null,"x":"missing"}
Homogeneous Join
The following join is homogeneous as the paths involved in the join criteria all have a value.
select {*}
from foo as f,
bar as b
where f.a = b.a
Results in
{"b":{"a":{"b":5},"n":true,"x":{"y":"foobar"}},
"f":{"a":{"b":5},"n":null,"x":{"y":"foobar"}}}
Null vs. Absent Value
In the JSON standard JSON null is a value. Compared to Relational SQL, JSON null does not express “unknown”. The equivalent to Relational SQL NULL is the absence of the value in JSON SQL. Therefore, a join where the paths involved in a join criteria have the value JSON null are homogeneous joins.
select {*}
from foo as f,
bar as b
where f.n = b.n
results in
{"b":{"a":{"b":11},"n":null,"x":"missing"},
"f":{"a":{"b":5},"n":null,"x":{"y":"foobar"}}}
Heterogeneous Join
A heterogeneous join in context of JSON SQL has paths in the join criteria that do not exist in at least one document, aka, do not refer to values in this case.
For example, the path x.y does not refer to a value in all documents of the example data set.
The semantics is that if a document does not have a value at the path of the join criteria the document does not participate in the Cartesian product, and therefore does not provide a document to the result set.
select {*}
from foo as f,
bar as b
where f.x.y = b.x.y
results in
{"b":{"a":{"b":5},"n":true,"x":{"y":"foobar"}},
"f":{"a":{"b":5},"n":null,"x":{"y":"foobar"}}}
Check for Missing Values
JSON SQL provides a predicate that supports checking the presence (or absence) of values. This predicate can be used to check if a join is going to be a homogeneous join or a heterogeneous join.
select {*}
from foo
where not exists_path x.y
results in
{"a":{"b":10},"n":false}
and
select {*}
from bar
where not exists_path x.y
results in
{"a":{"b":11},"n":null,"x":"missing"}
These queries show that the previous query is a heterogeneous join as not all documents contain the join paths.
In the absence of schema support for JSON this allows to check for homogeneity in context of joins, like a dynamic schema check for a very specific purpose. During software development it can be determined if it is important to have a homogeneous join or if a heterogeneous join is sufficient. Depending on the requirement and outcome of the query checking for path existing appropriate error handling can take place.
Summary
JSON SQL supports homogeneous as well as heterogeneous joins without any extra syntax or special execution semantics. Furthermore, with the predicate for checking existence the developer is given a tool to determine if a join is going to be homogeneous or heterogeneous.
Go [ JSON | Relational ] SQL!
Disclaimer
The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.