Oracle 12c – SQL for JSON (Part 6): Un-nesting

JSON supports nested structures and the particular shape of the nesting is decided by the application systems or by a data modeler for different reasons. How can nested JSON structures be queried with SQL?

Nested JSON Structures

Nested JSON structures are based on two types: JSON objects and JSON arrays. Objects can have several members; each member is either of scalar type, object type or array type. An array can contain scalar values, objects or arrays. This nesting of objects with objects and arrays provides the mechanism to build (theoretically unlimited deep) nested structures.

Nested structures are used in most cases to model a part-of relationship or an association relationship. As the nested structures are trees, there is a 1:n relationship between the higher and the lower nodes (without cycles as JSON types do not include reference types).

Folklore says that nesting is sufficient to ‘avoid’ joins in context of JSON databases, but professionals know a lot better.

In the following, an example data set is introduced and a few queries afterwards that specifically deal with nested structures in order to query those in context of SQL. A key JSON function in context of Oracle 12c is json_table() [https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246] combined with un-nesting.

Example Data Set

DROP TABLE reseller;
CREATE TABLE reseller
  (
    reseller_id   NUMBER NOT NULL,
    reseller_name VARCHAR(255),
    suppliers CLOB CONSTRAINT suppliers_ensure_json 
    CHECK (suppliers IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO reseller VALUES
(1, 'abc corp.',
  '{ "suppliers": [
    {"supplier_name": "Sign Corp", "parts": [
      {"partNo": 1, "price": 100, "quantity": 2},
      {"partNo": 2, "price": 200, "quantity": 4}]},
    {"supplier_name": "Stand Corp.", "parts": [
      {"partNo": 10, "price": 1000, "quantity": 6},
      {"partNo": 11, "price": 1100, "quantity": 8},
      {"partNo": 12, "price": 1200, "quantity": 10}]}]}');
INSERT INTO reseller VALUES
(2, 'xyz corp.',
  '{ "suppliers": [
    {"supplier_name": "Sign Corp", "parts": [
      {"partNo": 1, "price": 100, "quantity": 12},
      {"partNo": 3, "price": 300, "quantity": 14}]},
    {"supplier_name": "Stand Corp.", "parts": [
      {"partNo": 10, "price": 1000, "quantity": 16},
      {"partNo": 11, "price": 1100, "quantity": 18},
      {"partNo": 12, "price": 1200, "quantity": 20},
      {"partNo": 13, "price": 1300, "quantity": 22},
      {"partNo": 14, "price": 1400, "quantity": 24}]}]}');

Simply Query: Listing all Suppliers

The first query is going to list the distinct set of supplier names. Supplier names are part of objects inside arrays. Therefore the query will reach into the tree and iterate over the suppliers (using the [*] notation that says to iterate over all array elements) in order to pick out the supplier name (accomplished by the PATH expression). A column is declared that captures all supplier names and a distinct projection ensures that each supplier name is included only once into the result set.

SELECT DISTINCT s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name')) AS s;

Result:
1
Changing [*] to [1] only retrieves the first supplier and since the first supplier is the same across the data set of JSON documents only one supplier name appears in the result set.

SELECT DISTINCT s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[1]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name')) AS s;

Result:
2

Deeper Reach: Listing all Parts and Prices

The next query lists all parts and their prices. It creates two columns, one for the part number and one for the price. However, since each supplier can supply more than one part, the query has to iterate over two arrays (using the [*] notation), once over suppliers, and once over all parts within each supplier. The result is ordered in order to provide a consistent layout. 

SELECT DISTINCT s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*].parts[*]'
                COLUMNS (part_no NUMBER
                         PATH '$.partNo',
                         part_price NUMBER
                         PATH '$.price')) AS s
ORDER BY s.part_no;

Result:
3
It would be more interesting to see the supplier name also for each of the parts. This is where un-nesting is applied.

Un-nesting: Listing all Parts and Prices for each Supplier and for each Reseller

The next query lists the parts and their prices for each supplier, and for each reseller. This allows to see the reseller, the suppliers they use, as well as the parts that are supplied.

SELECT r.reseller_name,
       s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price'))) AS s
ORDER BY r.reseller_name,
         s.supplier_name,
         s.part_no;

Result:
4
Note the keyword NESTED that transforms a hierarchical relationship into a relational representation by repeating the column value on the higher level. In this example, for each part_no and part_price, the supplier_name is provided.

Computation on Un-nested JSON Data

The following query shows the inventory and inventory value for each reseller and supplier. The value is dynamically computed by the query.

SELECT r.reseller_name,
       s.*,
       s.part_price * s.part_quantity AS inventory_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER
                                  PATH '$.quantity'))) AS s
ORDER BY r.reseller_name,
         s.supplier_name,
         s.part_no;

Result:
5

Aggregation on Un-nested JSON Data

The following query shows the total inventory of each reseller by aggregation in the query.

SELECT r.reseller_name,
       SUM(s.part_price * s.part_quantity) AS inventory_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER
                                  PATH '$.quantity'))) AS s
GROUP BY r.reseller_name
ORDER BY r.reseller_name;

Result:
6

Changing to Seller Perspective

The following query shows the quantity and sales of each type of part to resellers. The perspective changes from a reseller perspective so far to a seller perspective. In SQL this is straight-forward, however, in context of nested JSON structures this would mean processing of the 1:n relationship in the opposite direction.

SELECT s.supplier_name,
       r.reseller_name,
       s.part_no,
       s.part_price,
       s.part_quantity,
       s.part_price * s.part_quantity AS sales_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER 
                                  PATH '$.quantity'))) AS s
ORDER BY s.supplier_name,
         r.reseller_name,
         s.part_no;

Result:
7

Aggregation of Seller Data

And, finally, this query shows the total sales by each supplier to each reseller.

SELECT s.supplier_name,
       r.reseller_name,
       SUM(s.part_price * s.part_quantity) AS sales_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER
                                  PATH '$.quantity'))) AS s
GROUP BY s.supplier_name,
         r.reseller_name
ORDER BY s.supplier_name;

Result:
8

Summary

As shown, the JSON function json_table() supports transforming hierarchically structured data into relationally structured data so that it is possible to execute expressive SQL queries, not only retrieving data, but also aggregating data on different levels of the hierarchy.

Once the hierarchical structure is un-nested, the data can be queried from different perspectives, aka, the 1:n relationship in the nested structure can be processed in both directions.

This is another example of the quite powerful and convenient integration of JSON into the world of relational databases and SQL. From an application development perspective this means that JSON documents can be used and at the same time made accessible to the full power of SQL.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 5): Property Existence

JSON structures do not have to follow a global schema in Oracle 12c; each JSON structure in a JSON column can have its own schema, with varying properties. This blog discusses the function JSON_EXISTS() that checks the existence of a property in a JSON structure and its application in SQL queries.

Sample Data Set

A sample data set for this blog is as follows. Note that not all JSON structures follow the same schema.

DROP TABLE people;
CREATE TABLE people
  (
    id NUMBER,
    person CLOB CONSTRAINT person_ensure_json 
    CHECK (person IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO people VALUES
(1, '{ "name": "Bob", "city": "SF"}');
INSERT INTO people VALUES
( 2, '{ "name": "Jake", "city": "PA"}');
INSERT INTO people VALUES
( 3, '{ "name": "Alice", "city": "NYC"}');
INSERT INTO people VALUES
( 4, '{ "name": "Jenn", "city": "Tokyo"}');
INSERT INTO people VALUES
( 5, '{ "name": "Frank", "city": "Berlin"}');
INSERT INTO people VALUES
( 6, '{ "name": "Jup", "city": "Paris"}');
INSERT INTO people VALUES
( 7, '{ "name": "Jib"}');
DROP TABLE cities;
CREATE TABLE cities
  (
    id NUMBER,
    city CLOB CONSTRAINT city_ensure_json 
    CHECK (city IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO cities VALUES
(101, '{"city": {"name": "SF", "population": 837442, 
                 "area": 121.4}, 
        "state": "CA", "country": "US"}');
INSERT INTO cities VALUES
(102, '{"city": {"name": "PA", "population": 64403, 
                 "area": 66.787}, 
        "state": "CA", "country": "US"}');
INSERT INTO cities VALUES
(103, '{"city": {"name": "NYC", "population": 8405837, 
                 "area": 783.8}, 
        "state": "NY", "country": "US"}');
INSERT INTO cities VALUES
(104, '{"city": {"name": "Tokyo", "population": 9071577, 
                 "area": 622.99}, 
        "state": null, "country": "Japan"}');
INSERT INTO cities VALUES
(105, '{"city": {"name": "Paris", "population": 10869000, 
                 "area": 2845}, 
        "state": null, "country": "France"}');
INSERT INTO cities VALUES
(106, '{"city": {"name": "Berlin", "population": 3517424, 
                 "area": 891.75}, 
        "state": null, "country": "Germany"}');
INSERT INTO cities VALUES
(107, '{"city": {"name": "Asilomar", "population": 100, 
                 "area": 2}}');

Absent JSON Property

The default SQL query behavior for absent JSON properties is discussed briefly first in order to establish the base semantics. 

  • Projection
    • Absent JSON properties are returned as SQL NULL
    • SELECT DISTINCT c.city.country FROM cities c;
    • Absent properties can therefore be checked for by using SQL NULL in a predicate
    • SELECT DISTINCT c.city.city.name, c.city.country FROM cities c WHERE c.city.country IS NULL;
  • Selection
    • Absent JSON properties do not match in predicates
    • SELECT DISTINCT c.city.city.name FROM cities c WHERE c.city.country = ‘US’;
  • Cartesian Product (join)
    • Absent JSON properties do not contribute to the cartesian product
    • SELECT cl.city.city.name, cr.city.city.name FROM cities cl, cities cr WHERE cl.city.country = cr.city.country;

The base semantics of absent JSON properties is consistent with their representation as SQL NULL.

Reasons for Absent JSON Properties

Why would JSON properties be present in some JSON structures of a JSON column, and absent from others? There are many possible legitimate reasons, some are discussed next:

  • Life Cycle Implementations (Application Design)
    • The life cycle of a concept (e.g., person) is implemented by adding property values over time. Properties are stored corresponding to the life cycle. For example, initially a person is added by name, with city and other information added later as they become known.
  • Schema Evolution
    • A property that was not required in the past (from a data model perspective) is added due to data model changes. Adding a new property to the already existing JSON structures might or might not be done and if not, different JSON documents have a different structure.
  • Errors
    • An error in the application system omits writing a property that should have been written. A real bug.
  • Intentional Design Decision
    • An application system was designed to only write properties that have values and to omit properties that do not have a value (yet). A corresponding design for deletion is that if a value of a property is deleted, the whole property is removed instead of being set to e.g. JSON null.

There is a design philosophy that emphasizes that even through JSON supports varying structures, they ideally should follow the same schema. This would mean that all JSON structures in a JSON column have the exact same set of properties. However, another design philosophy says that the dynamic nature of JSON structures should be taken advantage of avoiding the classical schema migration problem: a JSON structure should only contain those properties that make sense at this state of the structure life cycle.

Checking for Absent JSON Properties: JSON_EXISTS()

Since each JSON structure in a JSON column can have different schema it might be important to explicitly check for properties in SQL queries for query reliability and also documentation reasons.

Oracle 12c implements the function JSON_EXISTS() [https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246] that allows to check for existence of a property. For example:

SELECT DISTINCT c.city.city.name AS name 
FROM cities c 
WHERE json_exists(c.city, '$.country');

1

This query only selects those city names of cities that have their country specified.

More queries using JSON_EXISTS() follow:

CREATE OR REPLACE VIEW all_cities 
AS 
SELECT DISTINCT c.city.city.name AS name 
FROM cities c 
WHERE json_exists(c.city, '$.country');
SELECT p.person 
FROM people p, all_cities all_c 
WHERE p.person.city = all_c.name 
  AND json_exists(p.person, '$.city');

2

Details on JSON_EXISTS()

JSON_EXISTS() supports error handling clauses. For example, when a query accesses a JSON column with incorrect JSON structures, a choice is to ignore rows with incorrect JSON structures, include them into the result, or throw an error when processing the query. An example follows:

DROP TABLE incorrect_json;
CREATE TABLE incorrect_json
  ( id NUMBER, data CLOB);
INSERT INTO incorrect_json VALUES
(1, '{ "name": "item1", "value": "value1"}');
INSERT INTO incorrect_json VALUES
(2, '{ "name": "item2", "value": "value2_broken_json}');
SELECT * 
FROM incorrect_json ij 
WHERE json_exists(ij.data, '$.value' false ON error);

3

SELECT * 
FROM incorrect_json ij 
WHERE json_exists(ij.data, '$.value' true ON error);

4

SELECT * 
FROM incorrect_json ij 
WHERE json_exists(ij.data, '$.value' error ON error);

5

JSON_EXISTS() expects a column that can store json structures as strings; when applied to a non-string column, an error indicates this:

SELECT * 
FROM incorrect_json ij 
WHERE json_exists(ij.id, '$.missing_property');

6

When applied to a non-JSON column, the error message looks different:

DROP TABLE incorrect_json2;
CREATE TABLE incorrect_json2
  ( id VARCHAR(10), data CLOB);
INSERT INTO incorrect_json2 VALUES
('one', '{ "name": "item1", "value": "value1"}');
INSERT INTO incorrect_json2 VALUES
('two', '{ "name": "item2", "value": "value2_broken_json}');
SELECT * 
FROM incorrect_json2 ij 
WHERE json_exists(ij.id, '$' error ON error);

7

SELECT * 
FROM incorrect_json2 ij 
WHERE json_exists(ij.id, '$' false ON error);

8

SELECT * 
FROM incorrect_json2 ij 
WHERE json_exists(ij.id, '$' true ON error);

9

Summary

Varying schema in JSON structures can be explicitly explored using the JSON_EXISTS() function and therefore is a way to incorporate schema variation into SQL queries. This makes a powerful combination.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 4): Basic Views

With JSON support available in Oracle 12c, it is possible to define views on JSON data.

Background

Complex JSON documents do not have to be queried completely every time, and views can provide an easy way to define an important subset of the JSON documents’ content. The relational database management system Oracle 12c implements the view mechanism for SQL and that is available for JSON columns as well.

In the following a few JSON document view examples are discussed on and implemented. Views are defined as well as accessed, directly queried as well as part of other SQL statements.

Example Data Set

First, an example data set is introduced around people living in cities and data about these cities.

DROP TABLE people;
CREATE TABLE people
  (
    id NUMBER,
    person CLOB CONSTRAINT person_ensure_json 
    CHECK (person IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO people VALUES
( 1, '{ "name": "Bob", "city": "SF"}');
INSERT INTO people VALUES
( 2, '{ "name": "Jake", "city": "PA"}');
INSERT INTO people VALUES
( 3, '{ "name": "Alice", "city": "NYC"}');
INSERT INTO people VALUES
( 4, '{ "name": "Jenn", "city": "Tokyo"}');
INSERT INTO people VALUES
( 5, '{ "name": "Frank", "city": "Berlin"}');
INSERT INTO people VALUES
( 6, '{ "name": "Jup", "city": "Paris"}');
DROP TABLE cities;
CREATE TABLE cities
  (
    id NUMBER,
    city CLOB CONSTRAINT city_ensure_json 
    CHECK (city IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO cities VALUES
(101, '{"city": {"name": "SF", "population": 837442, 
                 "area": 121.4}, 
        "state": "CA", "country": "US"}');
INSERT INTO cities VALUES
(102, '{"city": {"name": "PA", "population": 64403, 
                 "area": 66.787}, 
        "state": "CA", "country": "US"}');
INSERT INTO cities VALUES
(103, '{"city": {"name": "NYC", "population": 8405837, 
                 "area": 783.8}, 
        "state": "NY", "country": "US"}');
INSERT INTO cities VALUES
(104, '{"city": {"name": "Tokyo", "population": 9071577, 
                 "area": 622.99}, 
        "state": null, "country": "Japan"}');
INSERT INTO cities VALUES
(105, '{"city": {"name": "Paris", "population": 10869000, 
                 "area": 2845}, 
        "state": null, "country": "France"}');
INSERT INTO cities VALUES
(106, '{"city": {"name": "Berlin", "population": 3517424, 
                 "area": 891.75}, 
        "state": null, "country": "Germany"}');

Views

A few example views and possible uses are discussed in the following. The results of the queries are displayed in graphical representation provided by SQLDeveloper.

Simple View

CREATE OR REPLACE VIEW all_cities
AS
SELECT DISTINCT c.city.city.name 
FROM cities c;
SELECT * FROM all_cities;

1

SELECT p.person
FROM people p, all_cities all_c
WHERE p.person.city = all_c.city;

 2

SELECT p.person.name, all_c.city
FROM people p, all_cities all_c
WHERE p.person.city = all_c.city;

3

View Definition with Predicate

CREATE OR REPLACE VIEW us_cities
AS
SELECT DISTINCT c.city.city.name 
FROM cities c 
WHERE c.city.country = 'US';
SELECT * FROM us_cities;

 4

SELECT p.person.name, us_c.city
FROM people p, us_cities us_c
WHERE p.person.city = us_c.city;

5

View Definition with Computation

CREATE OR REPLACE VIEW city_stats
AS
SELECT 
  c.city.city.name 
    AS name, 
  ROUND(c.city.city.population / c.city.city.area, 2) 
    AS population_density
FROM cities c;
SELECT * FROM city_stats;

 6

SELECT p.person.name, c_s.name, c_s.population_density
FROM people p, city_stats c_s
WHERE p.person.city = c_s.name
ORDER BY c_s.population_density DESC;

7

Summary

The benefit of having JSON support in a relational database management system is that mechanisms and operations beyond querying are available with views being one. It is clear that the ability to define views is important, from a data model and data semantics perspective as well as from a data security and developer convenience viewpoint.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.