SQL for JSON Rationalization Part 18: Set Operators, Sorting, Grouping and Subqueries

Further operators like set operators, sorting, grouping and sub-queries apply to JSON SQL as well. This blog discusses some of the additional operators.

Union, Difference (Except), Intersection (Intersect)

The set operators union (“UNION”), difference (“EXCEPT”), and intersection (“INTERSECT”) are supported by Relational SQL and in order for those to apply the inputs to the set operators have to have the correct schema. In context of JSON SQL there are no schema requirements or restrictions and the set operators operate on sets of JSON documents and implement the usual semantics.

Set operators rely on JSON document equality and as discussed earlier equality is recursively defined on the properties of JSON documents. Two JSON documents are equal if they have the same set of paths with each pair of paths (one from each document) leading to the same scalar values.

An example query is the following, combining all parts available in the US as well as Europe.

select {*}
from us_parts
union
select {*}
from eur_parts

Sorting

Sorting of result sets can be supported by JSON SQL as well. Paths can be defined in the order by part of a JSON SQL query and sorting takes place on the values the paths are referring to (“sorting paths”). In context of JSON documents that do not have to comply to a fixed schema a special interpretation is necessary for a few cases:

  • A property that is absent (aka, the path specified in the sorting section of the query does not exist in a document) cannot be sorted on. One possible semantics is that the absence of a value is the largest or lowest value possible and the document is sorted accordingly.

    A more recent SQL standard introduced the clause “NULLS FIRST” and “NULLS LAST” in order to define where SQL NULL is placed in a sorted result. The same could be followed here with e.g. “ABSENT FIRST” or “ABSENT LAST”.
  • Another case is type heterogeneity, meaning, the same path in different documents refers to different JSON types. In this case a possible strategy is to sort within each type, and then order the types based on a predefined order, like, null, true, false, string, number, object, array (arbitrary, but fixed order).

    Following the same idea of “NULLS FIRST” and “NULLS LAST”, a clause could be added the defines the type order, like “TYPE ORDER JSON_NULL, JSON_TRUE, JSON_FALSE, JSON_STRING, JSON_NUMBER, JSON_OBJECT, JSON_ARRAY”.

Unless the sorting paths of all documents in a result set comply to the same schema a total order cannot solely established based on values, but required additional rules like those outlined above in the bullet list.

The following example sorts by shipper rating.

select {*}
from shipper sh
order by sh.rating desc 
         absent last 
         type order json_null, json_true, json_false, 
                    json_string, json_number, 
                    json_object, json_array

Grouping and Having

Grouping of result documents can be implemented in JSON SQL as in Relational SQL with the usual aggregation functions. The having construct can be applied as well to select from the groups. Grouping is defined by paths into the JSON documents and the same discussion wrt. missing values or type heterogeneity applies as in the sorting discussion.

The following lists all states and shipper rating averages where the shippers have an average rating about a certain threshold.

select {sh.state, avg(sh.rating)}
from shipper sh
group by sh.state
having avg(sh.rating) > 5

Subqueries

JSON SQL can support sub-queries like Relational SQL does. In principle, a JSON SQL query can return results as object as well as relations. In context of a sub-query results are only returned in form of JSON documents.

The following example lists shippers in states that has been shipped to in the past.

select {*}
from shipper
where shipper.state in
  (select s.state
   from states s
   where s.shipped_to = true)

Summary

This brief discussion selected a few additional Relational SQL operators and has shown how they can be interpreted in context of JSON SQL. A this point I am confident that the complete Relational SQL semantics can be extended for the JSON types without restriction and with possible semantic interpretation extension due to the possible absence of a fixed schema.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

 

Advertisement

SQL for JSON Rationalization Part 17: Cartesian Product with Restriction (Join) (Again!)

There is a lot more to be said about joins in context of JSON SQL beyond the introduction in the previous blog.

“Join Homogeneous” Schema

The previous blog’s sample data set was homogeneous in the sense that all paths used in join criteria had a value in all documents. There was never the case that a path did not have a value. This is analogous to Relational SQL where columns used in joins always have values by virtue of the existence of a schema.

Let’s explore “join heterogeneity” in this blog. As usual, the sample data set is introduced first.

Sample Data Set

select {*} from foo

results in

{"a":{"b":5},"n":null,"x":{"y":"foobar"}}
{"a":{"b":10},"n":false}

and

select {*} from bar

results in

{"a":{"b":5},"n":true,"x":{"y":"foobar"}}
{"a":{"b":11},"n":null,"x":"missing"}

Homogeneous Join

The following join is homogeneous as the paths involved in the join criteria all have a value.

select {*} 
from  foo as f, 
      bar as b 
where f.a = b.a

Results in

{"b":{"a":{"b":5},"n":true,"x":{"y":"foobar"}},
 "f":{"a":{"b":5},"n":null,"x":{"y":"foobar"}}}

Null vs. Absent Value

In the JSON standard JSON null is a value. Compared to Relational SQL, JSON null does not express “unknown”. The equivalent to Relational SQL NULL is the absence of the value in JSON SQL. Therefore, a join where the paths involved in a join criteria have the value JSON null are homogeneous joins.

select {*} 
from  foo as f, 
      bar as b 
where f.n = b.n

results in

{"b":{"a":{"b":11},"n":null,"x":"missing"},
 "f":{"a":{"b":5},"n":null,"x":{"y":"foobar"}}}

Heterogeneous Join

A heterogeneous join in context of JSON SQL has paths in the join criteria that do not exist in at least one document, aka, do not refer to values in this case.

For example, the path x.y does not refer to a value in all documents of the example data set.

The semantics is that if a document does not have a value at the path of the join criteria the document does not participate in the Cartesian product, and therefore does not provide a document to the result set.

select {*} 
from  foo as f, 
      bar as b 
where f.x.y = b.x.y

results in

{"b":{"a":{"b":5},"n":true,"x":{"y":"foobar"}},
 "f":{"a":{"b":5},"n":null,"x":{"y":"foobar"}}}

Check for Missing Values

JSON SQL provides a predicate that supports checking the presence (or absence) of values. This predicate can be used to check if a join is going to be a homogeneous join or a heterogeneous join.

select {*} 
from  foo 
where not exists_path x.y

results in

{"a":{"b":10},"n":false}

and

select {*} 
from  bar 
where not exists_path x.y

results in

{"a":{"b":11},"n":null,"x":"missing"}

These queries show that the previous query is a heterogeneous join as not all documents contain the join paths.

In the absence of schema support for JSON this allows to check for homogeneity in context of joins, like a dynamic schema check for a very specific purpose. During software development it can be determined if it is important to have a homogeneous join or if a heterogeneous join is sufficient. Depending on the requirement and outcome of the query checking for path existing appropriate error handling can take place.

Summary

JSON SQL supports homogeneous as well as heterogeneous joins without any extra syntax or special execution semantics. Furthermore, with the predicate for checking existence the developer is given a tool to determine if a join is going to be homogeneous or heterogeneous.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 16: Cartesian Product with Restriction (Join)

Restrictions can be added to a Cartesian Product and this is briefly discussed in this blog. It demonstrates the power of joins in context of JSON documents.

Example Data Set

As always, the sample data sets that are being used for queries in this blog are introduced first.

select {*} from jer

results in

{"a":1,"b":20,"c":true,"d":{"x":"y"}}
{"a":2,"b":21,"c":true,"d":{"x":[null,5]}}

and

select {*} from tom

results in

{"a":3,"b":20,"c":false,"d":{"x":"y"}}
{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}

This data set is used in the following to introduce restrictions in context of a Cartesian Product.

Join and Join Criteria

The following demonstrates a join where the join criteria (restriction) is based on a scalar type.

select {*} 
from   jer as j, tom as t 
where  j.b = t.b

results in

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

{"j":{"a":2,"b":21,"c":true,"d":{"x":[null,5]}},
 "t":{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}}

Remember, that the result documents from each of the collections are disambiguated by adding the root property “j” and “t” (aka, the correlation specifications).

A join can be empty if the join criteria do not derive to a result, as shown in the following.

select {*} 
from   jer as j, tom as t 
where  j.a = t.a

does not return a result.

Projection can be applied as well.

select {t.b} 
from   jer as j, tom as t 
where  j.b = t.b

results in

{"t":{"b":20}}
{"t":{"b":21}}

Using an AS clause in the projection allows to reshape the result.

select {t.b as tb} 
from   jer as j, tom as t 
where  j.b = t.b

results in

{"tb":20}
{"tb":21}

Join criteria can be defined not only on top level scalar properties, but on any JSON structure on any level. The following two queries illustrate this.

select {*} 
from   jer as j, tom as t 
where  j.d.x.[1] = t.d.x.q

results in

{"j":{"a":2,"b":21,"c":true,"d":{"x":[null,5]}},
 "t":{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}}

and

select {*} 
from   jer as j, tom as t 
where  j.d = t.d

results in

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

Of course, equality is not the only possible operator for join criteria.

select {*} 
from   jer as j, tom as t 
where  j.a < t.a

results in

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}}

{"j":{"a":2,"b":21,"c":true,"d":{"x":[null,5]}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

{"j":{"a":2,"b":21,"c":true,"d":{"x":[null,5]}},
 "t":{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}}

and so does

select {*} 
from   jer as j, tom as t 
where  j.a <> t.a

Cartesian Product with Restriction

Cartesian products can be restricted with non-join criteria.

select {*} 
from   jer as j, tom as t 
where  j.c = true 
       or t.c = false

results in

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}}

{"j":{"a":2,"b":21,"c":true,"d":{"x":[null,5]}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

{"j":{"a":2,"b":21,"c":true,"d":{"x":[null,5]}},
 "t":{"a":4,"b":21,"c":false,"d":{"x":{"p":null,"q":5}}}}

Join with Join and Non-Join Criteria

And a mix of join and non-join criteria is possible as well.

select {*} 
from   jer as j, tom as t 
where  j.d = t.d 
       and j.b = t.b 
       and (j.c = true or t.c = false)

results in

{"j":{"a":1,"b":20,"c":true,"d":{"x":"y"}},
 "t":{"a":3,"b":20,"c":false,"d":{"x":"y"}}}

Summary

Joins are a powerful feature of JSON SQL as demonstrated in this blog as it supports the combination of documents in different collections without having to foresee their combination when deciding on the document structures. Joins combine the power of JSON documents with the power of value-based correlation of documents.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 15: Cartesian Product and Projection

In part 14 of this blog series Cartesian Product queries were discussed that did have an Asterisk projection; this blog discusses specific paths as projection (non-Asterisk).

Example Data Set

As always, the sample data sets that are being used for queries in this blog are introduced first.

select {*} from ying

results in

{"a":3,"c":20}
{"a":4,"c":21}

and

select {*} from yang

results in

{"a":1,"b":10}
{"a":2,"b":11}

Projection

To recap, JSON SQL supports JSON projection as well as relational projection. JSON projection is specified by enclosing paths within a set of curly brackets: {}. This will cause the query result represented as JSON objects.

For example, the following query returns JSON objects.

select {a, b} from yang

results in

{"a":1,"b":10}
{"a":2,"b":11}

JSON SQL returns relational results when the set of curly brackets is omitted; the following query returns the result as table.

select a, b from yang

results in

|a                        |b                        |
+-------------------------+-------------------------+
|1                        |10                       |
|2                        |11                       |

Projection without AS in Joins

The following is a projection of a join resulting in JSON objects.

select {yi.a, ya.b} from ying as yi, yang as ya

results in

{"ya":{"b":10},"yi":{"a":3}}
{"ya":{"b":11},"yi":{"a":3}}
{"ya":{"b":10},"yi":{"a":4}}
{"ya":{"b":11},"yi":{"a":4}}

The same query with results represented as relation is specified as follows.

select yi.a, ya.b from ying as yi, yang as ya

results in

|yi_a                     |ya_b                     |
+-------------------------+-------------------------+
|3                        |10                       |
|3                        |11                       |
|4                        |10                       |
|4                        |11                       |

Observe that the results include the table correlation specifiers “yi” or “ya”. This is necessary since different collections might have documents with the same paths. The following query highlights this case.

select {yi.a, ya.a} from ying as yi, yang as ya

results in

{"ya":{"a":1},"yi":{"a":3}}
{"ya":{"a":2},"yi":{"a":3}}
{"ya":{"a":1},"yi":{"a":4}}
{"ya":{"a":2},"yi":{"a":4}}

This automatic result qualification using correlation specifications ensures that path duplicates are automatically resolved in the results.

Projection with AS in Joins

In many cases the automatic duplicate resolution is sufficient for clients. However, in some cases this is not desired. In those cases the AS clause allows the placement of result values into any place of JSON documents using the AS clause. In the relational result case the columns can be named as desired.

select {yi.a as b, ya.a as c} from ying as yi, yang as ya

results in

{"b":3,"c":1}
{"b":3,"c":2}
{"b":4,"c":1}
{"b":4,"c":2}

The above shows a simple renaming of the paths.

select {yi.a as x.b, ya.a as y.[0]} from ying as yi, yang as ya

results in

{"x":{"b":3},"y":[1]}
{"x":{"b":3},"y":[2]}
{"x":{"b":4},"y":[1]}
{"x":{"b":4},"y":[2]}

This query shows a more complex result object creation and goes beyond simple renaming of paths.

The following query shows how specific column names are specified.

select yi.a as x, ya.a as y from ying as yi, yang as ya

results in

|x                        |y                        |
+-------------------------+-------------------------+
|3                        |1                        |
|3                        |2                        |
|4                        |1                        |
|4                        |2                        |

Summary

In summary, defining projection in context of SQL JSON joins is straightforward and supports flexible renaming of columns in context of relational results as well as expressive result value positioning as paths in JSON object results.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 6: Restriction – General Discussion

After discussing projection, selection is up next in the blog series on SQL for JSON. This first blog on selection focuses on the scalar JSON types Number and String.

Demo Data

As usual, we start with demo data. The collection for this blog is called “selcoll” (for SelectionCollection) and contains the following documents:

select {*} from selcoll

returns

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}
{"a":{"b":"25"},"c":["foo1","foobar","bar"],"d":{"e":"foo"}}
{"a":{"b":25},"c":["foo2","foo2bar2","ba\"r"],"d":{"e":"foo2"}}

Selection based on Literals

Selection is following the regular Relational SQL syntax and is straightforward. For this discussion only single predicates are shown, not (complex) Boolean expressions of predicates. Boolean expression of predicates follow the usual semantics and do not require a lot of discussion.

select {*} from selcoll where a.b = 25

This JSON SQL statement selects all documents from selcoll that have a property “a” and a property “b” within a sub-document of “a” with the value of 25.

The result is

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}
{"a":{"b":25},"c":["foo2","foo2bar2","ba\"r"],"d":{"e":"foo2"}}

The following selection has the same semantics and returns the same result:

select {*} from selcoll where 25 = a.b

A selection based on a String literal follows the same syntax:

select {*} from selcoll where c.[1] = 'foobar'

This returns

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}
{"a":{"b":"25"},"c":["foo1","foobar","bar"],"d":{"e":"foo"}}

And the following selection returns the same result:

select {*} from selcoll where 'foobar' = c.[1]

In this context a note is in order. JSON uses double quotes as string delimiter, not single quotes, as SQL does. In order to stay as near as possible to Relational SQL, single quotes are used and transformed into double quotes by the underlying implementation.

Selection based on Value Comparison

It is possible to relate two different values within a document as well (aka, not a self-join that would related values of different documents – this will be discussed in a later blog).

select d.e as de, c.[0] as c0 from selcoll where d.e = c.[0]

This query selects all documents that have the same value in d.e and c.[0]. As added benefit the query projects to those two values as well.

The result is

|de                       |c0                       |
+-------------------------+-------------------------+
|"foo"                    |"foo"                    |
|"foo2"                   |"foo2"                   |

Any path can be related to any other path without restriction.

While in this blog only numbers and strings are discussed, the above discussed types of restrictions will work for all JSON data types in general, including true, false, null, objects and arrays (discussed in subsequent blogs).

Operations

The usual operators are defined: <, >, <>, =, >=, and <=. The semantics of these is defined for Number and String (Relational SQL semantics is taken). For the other JSON types they will have to be defined as the other JSON types do not have a corresponding Relational SQL domain.

Beyond these operators more “interesting” operations are required. For example

select {*} from selcoll where c contains 'foobar'

whereby “c” refers to a JSON array (and possibly a JSON object). This predicate would be true if there is an element in “c” that is of type String and the value of that element is “foobar”. There is a whole set of interesting operations that will be discussed at some point later as well.

Semantics

As implicitly demonstrated above, a JSON document is only in the result set if (a) the path to the value as specified in the JSON SQL query is present and (b) the value is the value as indicated in the selection clause in JSON SQL.

If the path does not exist or the value does not have a matching value, no result is returned for that document (and not the empty document itself).

There is no implicit type transformation implemented. This means that a Number literal only matches number values, and a String literal only matches string values.

Syntax Twists

Syntax has always a twist, especially if different languages are combined. In this case one of the twists is the single quote. A single quote within a string is represented as two single quotes in Relational SQL. JSON SQL does not have that requirement since strings are delimited by double quotes in JSON and a single quote is treated as regular character. The reverse situation exists also: double quotes have to be escaped within a string in JSON, but not in Relational SQL.

The query (double quote, not escaped in JSON SQL)

select {*} from selcoll where c.[2] = 'ba"r'

returns

{"a":{"b":25},"c":["foo2","foo2bar2","ba\"r"],"d":{"e":"foo2"}}

And the query (two single quotes, escaped in JSON SQL)

select {*} from selcoll where c.[2] = 'ba''r'

returns

{"a":{"b":25},"c":["foo","foobar","ba'r"],"d":{"e":"foo"}}

Summary

Restriction (or selection) is almost straightforward for the types Number and String in context of JSON SQL. The only twist is the way Relational SQL and JSON SQL differ in denoting String literals as well as encode special characters.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 5: Projection – Specific Functionality

The last blog introduced SQL JSON projection and this blog will discuss some of its finer points.

Demo Data

Here are the JSON documents from the collection “tinycoll” used throughout this blog:

select {*} from tinycoll

returns two JSON documents:

{"a": 5,
 "b": {"c": 10,"d": 11},
 "c": [101, 102, {"d": 103}, {"e": 104}]}

{"a": 5,
 "b2": [10, 11],
 "c": [101, 102, {"d": 103}, {"e": 104}]}

AS Clause

In Relational SQL it is possible to rename columns. The AS clause is the means to do this and it contains an alternative column name. Example:

select a as abc from tinycoll

The result contains a column called “abc” instead of “a” and this is standard Relational SQL semantics.

|abc                      |
+-------------------------+
|5                        |
|5                        |

What does an AS clause mean in context of JSON SQL? In context of JSON SQL an AS clause specifies a path. Example:

select {a as x.y} from tinycoll

The result contains documents with paths “x.y” that contain the value of the corresponding “a” in the original document (if “a” is present).

{"x":{"y":5}}
{"x":{"y":5}}

Fundamentally, it means that the value the original path “a” pointed to is now at a new path “x.y” and that can be seen as relocation that only takes place in the result document. Any valid path is possible in the AS clause.

So far the AS clause supports renaming as well as relocation. Relocation is orthogonal and does not affect the original document. For example, the following relocations are valid:

select {a as b, b as a} from tinycoll

Basically, the values are exchanged between the two paths “a” and “b” (which can be more complex paths, of course).

{"a":{"c":10,"d":11},"b":5}
{"b":5}

All AS clauses are applied independently of each other, not in sequence (and therefore “a” and “b” do not contain the same value because of this projection specification).

A final situation is overwriting, meaning, the path in the AS clause can be that of an existing path in a JSON document and that will overwrite the value in the result document. For example:

select {a as c.[0]} from tinycoll

The existing value of “c.[0]” is overwritten and contains the value of “a” in the result document if “a” exists in the original document.

{"c":[5]}
{"c":[5]}

There are a few language constraints that are checked for. These are

  • Path Subsumption. A path in an AS clause must not be a subpath in any other path; otherwise one AS clause might conflict with another one. An example for a violation is: “select {a as c.[2].d, b as c.[2]} from tinycoll”. This is analogous to Relational SQL not allowing the use of the same column name in two different AS clauses.
  • Asterisk Query. An asterisk query cannot have an AS clause; if any change is necessary by means of an AS clause, the paths have to be listed explicitly.
  • Relational Output Path. The path in an AS clause for relation output must be a single value (path of length one) in order to comply to the Relational SQL semantics/model.

Value Non-Existence

The AS clause might create a path in a result document that does not exist in the original document. For example:

select {a as x.[2]} from tinycoll

In this example, the original document does not have an array named “x”. However, the result document is going to have one if “a” is present. The path sets the value of “a” to the third array element, however, the first and second element do not have a value as those elements do not exist. The result of the query is

{"x":["<>","<>",5]}
{"x":["<>","<>",5]}

The JSON standard does not have a notation for an absent value, however, it is needed in order to describe accurately that values are undefined. Therefore, the symbol “<>” is introduced of type String in order to (a) denote that a value is undefined and to (b) represent it as a known data type so that JSON libraries can process it.

“<>” is randomly defined; it can be changed to another symbol as necessary. JSON null cannot be used as JSON null is a valid constant (aka, explicit JSON value) and in contrast to SQL null does not denote “unknown”. The use of JSON null might suggest that there is the value of JSON null, when in reality there is no value at all. Any trailing “<>” are removed and not present in the output JSON documents.

Array Element Replacement

It is possible to replace array elements selectively, for example:

select {a as c.[0], b as c.[1], c.[2]} from tinycoll

will result in

{"c":[5,{"c":10,"d":11},{"d":103}]}
{"c":[5,"<>",{"d":103}]}

A shortcut syntax like c.[2..9] that refers to the 3rd until 10th elements inclusive is not supported at this point, but could be for convenience. If implemented at some point in time, then this section will be changed.

Likewise for a shortcut syntax like c.[..9], c.[2..] or c.[..] indicating all elements including the 10th, all starting with the 3rd, or all element respectively.

Additional Items

The “select distinct” clause is not specifically discussed as it has the intended semantics based on the JSON document equality definition.

An interesting case on projection is the mixed case, aka, some projection is relational, some other asks for the JSON form. For example,

select a, b, {c, d} from tinycoll

returns relational output, but with certain columns containing JSON objects that can be freely composed from one or more paths. This might be convenient from a final output viewpoint for the client, but would not contribute in major ways to a JSON SQL language definition. Therefore, it is not implemented as of now (and in case this decision changes, this section will be updated in the future).

Summary

Projection in context of JSON SQL is not all that straightforward compared to the Relational SQL semantics. This blog highlighted the most important finer points like the AS clause and array processing and outlined some of the additional possible extensions to an implementation.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

SQL for JSON Rationalization Part 4: Projection – General Functionality

After the demo in the last blog (Part 3) it is time to discuss some of the assumptions and the projection functionality in more detail – here and in the next blog.

Assumptions: Array Start Index, JSON Literals and JSON Value Equality

The JSON standard does not define the starting index of the first array element. The assumption made here is that the first index is 0 (zero).

The JSON standard requires the literals “null”, “true” and “false” to be lower case. However, the assumption made here is that all lower as well as upper case combinations work, e.g., “True”, for convenience.

Another aspect the JSON standard does not define is equality on JSON values. There are many ways to define when two JSON values are equal. Here equality is defined on the string representation of the JSON values that contain no white space and where the property names in JSON objects are sorted alphabetically.

Definitions: Full and Partial Path

A full path is a sequence of property names as well as array indexes from the root of a JSON document all the way to one of its leaves. The elements of a path are separated by “.”. For example, “c.[3].e” is a full path from the root “c” to the leaf “e” in one of the demo documents of the previous blog. A path must start with a property name and cannot start with an array index. A path cannot be empty and the minimum path consists of a single property name.

Using “.” as separator is a random choice, but made by many systems. Having array indexes enclosed in “[” and “]” is customary also. Denoting an array index as separate path element (aka, enclosed in “.”) is also a convenient choice.

Given a JSON object, a full path might exist within it or not. Given a JSON object and a path, using the path as an access structure identifies a value only if the full path exists in the JSON document. If the path does not exist within the JSON document then no value is identified; especially not the JSON literal “null”.

A partial path is a sequence of property names and array indexes starting at the root, but not necessarily ending at a leaf, but at an intermediary property or array index. This supports “reaching into” a JSON document and identifying a JSON value that is between the root and a leaf.

Like in case of full paths, given a JSON object, a partial path might or might not exist within it. A partial path only identifies a JSON value if the partial path exists within a JSON object. In this case it identifies a composite JSON value.

If a JSON document has only scalar properties, then the root properties are the leaf properties at the same time. Paths in this context are full paths and partial paths cannot exist.

Projection

Unlike in the relational model, in context of the JSON model the result of a query can be returned as a relational table, or as a set of JSON documents. The choice is made by the query author.

The projection in a select statement contains one or more (full or partial) paths. If the paths are enclosed by a “{“ and “}” then JSON documents are returned, otherwise a table  (the asterisk projection is discussed below).

For example, the query from the previous blog

select a, b.c, d.[3].e from tinycoll

returns a table with three columns.

Semantically, each path in the projection will be a separate column. Each document from the collection “tinycoll” is taken and a corresponding row is added to the table. For each path of the projection that is found in the document the value is added to the row. If a path does not exist, no value is added in the column corresponding to the path. Therefore, a row can have values in every column, in some columns, or in no column, depending if the paths exist in the document.

As in relational SQL, the order of the paths matters as the corresponding columns will be created in that order.

The column names are created from the paths by replacing “.” in the path representation with “_” as many relational systems do not support “.” as column names.

The query

select {a, b.c, d.[3].e} from tinycoll

returns a set of JSON documents.

Semantically, each document from the collection “tinycoll” is taken and an empty result document is created for it. Each of the paths from the projection are followed within the document from the collection. If a value is found, the path with the corresponding value is added to the result document. It is possible that the document from the collection contains all, some, or none of the paths from the projection. Therefore, the result document might contain all, some, or none of the paths (empty document).

The order of the paths in the projection does not matter as JSON documents are created and order of properties / paths is not defined for JSON objects.

As a note, according to the construction principle of the result JSON documents, the paths in the projection of the select statement and the paths in the result JSON documents are exactly the same (if they exist). No translation is necessary from the viewpoint of the client between the paths in the query and the paths in the result documents.

Asterisk Projection

The asterisk projection is supported. The query

select {*} from tinycoll

returns all documents stored in the collection “tinycoll” as they are without any modification.

The query

select * from tinycoll

Returns a table that has any many columns as there are full and partial paths into all documents of the collection “tinycoll”.

Semantically, each document from the collection “tinycoll” is taken and a row is created for it. For each full as well as partial path in the document the value is retrieved and put into the corresponding column of the row. There is a column for each possible path and the set of columns can be predetermined or dynamically added to the result table as needed. As before, the column names are the path with the “.” replaced by “_”.

Summary

This was a first closer look into the details of projection in context of JSON SQL and the next blog will continue the project discussion. The key take away is that JSON SQL can return JSON documents as well as tables based on a well-defined execution semantics centered around JSON paths.

Go [ JSON | Relational ] SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

 

Oracle 12c – SQL for JSON (Part 3): Basic Joins

Having JSON support in a relational database means that the join operator is available not only for joining relational data or JSON data, but also for the mixed case: joining JSON and relational data. This opens up a whole new world of data modeling and query execution.

Running Example

This running example creates three tables, “demo”, “city” and “city_rel” and a sample data set in each table. The tables “city” and “city_rel” contain the same data set, once in JSON format, and once in relational format.

DROP TABLE demo;
CREATE TABLE demo
(
  id NUMBER,
  person CLOB 
    CONSTRAINT person_ensure_json 
    CHECK (person IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO demo VALUES
( 1, '{ "name": "Bob", "city": "SF"}' );
INSERT INTO demo VALUES
( 2, '{ "name": "Jake", "city": "PA"}' );
INSERT INTO demo VALUES
( 3, '{ "name": "Alice", "city": "NYC"}' );
INSERT INTO demo VALUES
( 4, '{ "name": "Jenn",  "city": {"name": "Tokyo"}}' );
INSERT INTO demo VALUES
( 5, '{ "name": "Jenn",  "city": ["Tokyo"]}' );
INSERT INTO demo VALUES
( 6, '{ "name": "Jenn",  "city": 66}' );
DROP TABLE city;
CREATE TABLE city
(
  id NUMBER,
  city CLOB 
    CONSTRAINT city_ensure_json 
    CHECK (city IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO city VALUES
( 101, '{"city": "SF", "state": "CA", 
  "country": "US"}' );
INSERT INTO city VALUES
( 102, '{"city": "PA", "state": "CA", 
  "country": "US"}' );
INSERT INTO city VALUES
( 103, '{"city": "NYC", "state": "NY", 
  "country": "US"}' );
INSERT INTO city VALUES
( 104, '{"city": {"name": "Tokyo"}, "state": null, 
  "country": "Japan"}' );
INSERT INTO city VALUES
( 105, '{"city": ["Tokyo"], "state": null, 
  "country": "Japan"}' );
INSERT INTO city VALUE
( 106, '{"city": 66, "state": null, 
  "country": "World"}' );
DROP TABLE city_rel;
CREATE TABLE city_rel
(
  id      NUMBER,
  city    VARCHAR(255),
  state   VARCHAR(255),
  country VARCHAR(255));
INSERT INTO city_rel VALUES
( 1001, 'SF', 'CA', 'US' );
INSERT INTO city_rel VALUES
( 1002, 'PA', 'CA', 'US' );
INSERT INTO city_rel VALUES
( 1003, 'NYC', 'NY', 'US' );
INSERT INTO city_rel VALUES
( 1004, '{"name": "Tokyo"}', NULL, 'World' );
INSERT INTO city_rel VALUES
( 1005, '["Tokyo"]', NULL, 'World' );
INSERT INTO city_rel VALUES
( 1006, '66', NULL, 'World' );

JSON Join

The following SQL statement is a simple join between JSON structures on the property “city”:

SELECT *
FROM demo d, city c
WHERE d.person.city = c.city.city;

This SQL statement projects in addition to joining JSON structures:

SELECT d.person, c.city
FROM demo d, city c
WHERE d.person.city = c.city.city;

The following SQL statement extends the projection:

SELECT 
  d.id,
  d.person.name,
  d.person.city,
  c.id,
  c.city.state,
  c.city.country
FROM demo d, city c
WHERE d.person.city = c.city.city;

JSON – Relational Join

 This SQL statement shows the join between JSON and relational data, combined with a projection:

SELECT 
  d.id,
  d.person.name,
  d.person.city,
  c_r.id,
  c_r.city,
  c_r.state,
  c_r.country
FROM demo d, city_rel c_r
WHERE d.person.city = c_r.city;

Significance of Pure and Mixed JSON Joins

As shown, the join operator is applied easily within JSON tables and across JSON and relational tables. When using Oracle 12c there is no restriction anymore when it comes to the join operator in conjunction of JSON documents.

Pure JSON joins are possible in context of Oracle 12c. This means that developers have a choice to model all data in a pure document form (trying to avoid the need for joins by creating sub-collections – which is almost impossible without denormalization), or to consciously model documents in such a way that the document nature is applied where applicable without having to necessarily de-normalize as the join operator is available.

The mixed case between JSON and relational tables goes a lot further as now data can be modeled according to its nature (not all data is exclusively document-oriented or relational) and its access path requirements without compromising either way.

In addition, the mixed case supports the situation where data is already present in the database in relational form and new data is added in JSON form. This means that even if data is available in relational form, additional data does not have to be in relational form, and the most appropriate representation can be chosen (and no separate document-oriented database has to be deployed, btw).

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 2): Basic Queries

This blog provides a small tour of basic SQL queries that operate on JSON in context of the Oracle Database 12c Release 1 (12.1.0.2.0).

Sample Data Set

The following very basic data set is used as an example in this blog. It is kept simple in order to be able to focus on the queries without having to deal with complex JSON objects at the same time.

First, a simple table is created that contains a JSON column. Next, some rows containing JSON objects are inserted.

DROP TABLE demo;
CREATE TABLE demo
( id NUMBER,
  player CLOB 
    CONSTRAINT player_ensure_json 
      CHECK (player IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO demo 
VALUES (1, '{"person": "Bob", "score": 10}');
INSERT INTO demo 
VALUES (2, '{"person": "Bob", "score": 20}');
INSERT INTO demo 
VALUES (3, '{"person": "Jake", "score": 100}');
INSERT INTO demo 
VALUES (4, '{"person": "Jake", "score": 200}');
INSERT INTO demo 
VALUES (5, '{"person": "Alice", "score": 1000}');

With the sample data set in place, we can now construct a complex query in several steps.

Selection and Projection

The most basic query selecting the complete data set is

SELECT * FROM demo d;

A basic projection extracting only the person from the JSON objects is

SELECT d.player.person FROM demo d;

A basic selection restricting the JSON objects is

SELECT d.player.person
FROM demo d
WHERE d.player.person IN ('Jake', 'Bob');

The syntax for accessing properties in JSON objects is in principle

<table alias>.<JSON column>.<path to JSON object key>

with variations on JSON array index references if required (http://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246).

A more complex selection with an additional restriction is

SELECT d.player.person
FROM demo d
WHERE d.player.score > 0
  AND d.player.person IN ('Jake', 'Bob');

Ordering

Results can be ordered, for example, in the following way

SELECT d.player.person
FROM demo d
WHERE d.player.score > 0
AND d.player.person IN ('Jake', 'Bob');
ORDER BY d.player.person DESC;

Grouping

Results can be grouped also as a preparation for aggregation

SELECT d.player.person
FROM demo d
WHERE d.player.score > 0
  AND d.player.person IN ('Jake', 'Bob');
GROUP BY d.player.person
ORDER BY d.player.person DESC;

Aggregation

Different aggregation functions can be used to do some basic analysis

SELECT d.player.person,
  SUM(d.player.score),
  AVG(d.player.score),
  MIN(d.player.score),
  COUNT(*)
FROM demo d
WHERE d.player.score > 0
  AND d.player.person IN ('Jake', 'Bob');
GROUP BY d.player.person
ORDER BY d.player.person DESC;

Final Result

The final result is show here in table representation (copied from SQLDeveloper)

result

Inspiration

This example was inspired, in fact, by http://www.querymongo.com. There, the MySQL Query

sql_query

is translated to one of MongoDB’s query interfaces to

mongo_query

(web site accessed on 10/21/2014).

Summary

In summary, SQL functionality is available not only for the relational model in the Oracle Database 12c, but also for JSON-based data.

This makes the Oracle database a quite powerful JSON processing environment as querying JSON data is possible through the declarative SQL language.

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Document Projection (Part 2): Definition

What does projection in context of JSON structures or documents actually mean? What should the definition of “projection” be? There are several possibilities discussed next.

Document Projection: Complete Branch

Projection in JSON is projecting a branch of the JSON data structure tree, not projecting a single object or value. To request a projection, a property (projection) path in dot notation could be used (and actually is in many systems). The result of a projection is a valid JSON document containing the specified branch(es).

An example document is

{"a": {"b": {"c": 3, "d": 4, "e": 5}}}

Let’s go through a series of projections in the following.

  • Projection path: “a.b.c”
  • Result: {“a”: {“b”: {“c”: 3}}}
  • Projection path: “a.b”
  • Result: {“a”: {“b”: {“c”: 3, “d”: 4, “e”: 5}}}
  • Projection path: “a.e”
  • Result: {}

The result contains the full path of the projection (or more, but not less). If the requested projection path does not exist, the result is the empty document as none of its properties matches the projection path. The empty projection path “” is not defined, meaning, a projection must name at least one property, and that will be a top-level property in a document.

Several branches can be projected concurrently.

  • Projection paths: “a.b.c”, “a.b.d”
  • Result: {“a”: {“b”: {“c”: 3, “d”: 4}}}

The resulting document contains the combination of all of the branches that result in a valid projection. Redundant projection path specification is possible if one of the projection paths is a sub-path of another one. However, the result document is the same if redundancy is present or absent.

Document Projection: Partial Branch

It might be possible that the whole projection path does not exist, but a part of it. In this case it is a possibility to add the existing result up to that point (MongoDB follows this approach). This results in partial paths whereby the value of their last property is the empty document.

For example, “a.b.f” would result in {“a”: {“b”: {}}}. “a” and “b” exist in the example document, “f”, however, does not.

In my opinion, while possibly useful in some cases, I would not make this the default or standard definition as a result is returned that is incomplete and I could argue that it is in fact incorrect since the value of “b” is not the empty document (I could envision a configuration setting that provides these partial branches if needed).

Document Projection: Value

Wait a minute, why does the result document have to consist of full paths?

The reason is based on the implicit restriction on JSON documents that there can be only one property of a given name on the same level in a document. “Implicit” because the JSON definition (http://json.org/) does not mandate the restriction, but many implementations do: property names on the same level of embedding have to be unique.

For example:

{"x": {"b": {"c": 3, "d": 4}}, 
 "y": {"e": {"c": 3, "d": 4}}}

is a perfectly valid document where the property names are unique on every level. So let’s get back to projection and let’s for a moment assume that projection actually returns the value at the end of the path, omitting the path to the property value itself. So,

  • Projection path: “x.b.c”
  • Result: {“c”: 3}

So far so good.

  • Projection paths: “x.b.c”, “y.e.c”
  • Result: ?

What should the result be? The basic assumption is that a projection on a document returns a document. But “x.b.c” and “y.e.c” both return {“c”: 3} as separate documents, but not one document.

  • One possible result could be an array with two documents. But arrays are in general not considered valid top level documents (again, the JSON definition would allow that).
  • To mitigate that, the array could be the value of a property: {“result”: [{“c”: 3}, {“c”: 3}]}. But this would conflict with a document that happens to have a property “result” of type array with two same documents in it.
  • Or, the two documents would have to be embedded in a third one with special names, like {“1”: {“c”: 3}, “2”: {“c”: 3}}. But then, the original document does not have the properties “1” or “2”.

Based on this discussion having projection results being full paths is simpler and direct.

Projection – Result Correspondence Principle

There is also another argument from the user’s viewpoint. If a user wants to project “x.b.c”, then the user might want to access the result document after the query returns with “x.b.c” as the access path. From this viewpoint, the path in the query and the path in the result document should match and not require access path transformation.

Array Projection: Complete Access Path

Documents may contain arrays as well as array of arrays, arrays of objects of arrays, etc., in principle any of these composite types can be on any level of the document. Projection therefore has to be defined on arrays also, not just documents.

The principle of project paths is extended to include array index specification. For example, let’s consider this document:

{"a": [{"a1": 1}, {"a2": 2}], 
 "b": {"c": [{"c1": 3}, {"c2": 4}, {"c3": 5}]}, 
 "d": [6, 7]}

Let’s do a few projections (arrays are 0-index based):

  • Projection path: a[0]
  • Result: {“a”: [{“a1”: 1}]}
  • Projection path: b.c[1]
  • Result: {“b”: {“c”: [“c2”: 4]}}
  • Projection paths: a[1], b.c[2].c3
  • Result: {“a”: [{“a2”: 2}], “b”: {“c”: [{“c3”: 5}]}}
  • Projection path: a[7]
  • Result: {}

Like in the case of documents, full paths are requested and full paths are returned, with several paths possible. A projection path referring to a non-existing property will not contribute to the result.

So far, so good, except that the results do not yet conform to the “Projection – Result Correspondence” principle from above: the projection “a[1]” resulted in a correct document, but that result document cannot be accessed with “a[1]” to obtain the value.

Array Projection: Padding

In order to support the “Projection – Result Correspondence” principle array results can be padded with the value “null”. For example:

  • Projection paths: a[1], b.c[2].c3
  • Result: {“a”: [null, {“a2”: 2}], “b”: {“c”: [null, null, {“c3”: 5}]}}

Now it is possible to access the result with “a[1]” or “b.c[2].c3” in order to obtain the proper results. From a user’s perspective this is great as again the paths used to specify the projection can be used to retrieve the values.

Array Projection: Scalar Values

Scalar values in arrays do not pose a specific problem:

  • Projection paths: a[1], d[1], d[2]
  • Result: {“a”: [null, {“a2”: 2}], “d”: [null, 7]}

And their access can be accomplished using the projection paths.

Summary

Initially I thought projection is a straight forward function and not worth a discussion in context of document-oriented databases; but then it turned out to be not that clear cut. Nevertheless, the above is a starting point for a strict rationalization of projection in document-oriented databases based on the JSON data model.