Oracle 12c – SQL for JSON (Part 6): Un-nesting

JSON supports nested structures and the particular shape of the nesting is decided by the application systems or by a data modeler for different reasons. How can nested JSON structures be queried with SQL?

Nested JSON Structures

Nested JSON structures are based on two types: JSON objects and JSON arrays. Objects can have several members; each member is either of scalar type, object type or array type. An array can contain scalar values, objects or arrays. This nesting of objects with objects and arrays provides the mechanism to build (theoretically unlimited deep) nested structures.

Nested structures are used in most cases to model a part-of relationship or an association relationship. As the nested structures are trees, there is a 1:n relationship between the higher and the lower nodes (without cycles as JSON types do not include reference types).

Folklore says that nesting is sufficient to ‘avoid’ joins in context of JSON databases, but professionals know a lot better.

In the following, an example data set is introduced and a few queries afterwards that specifically deal with nested structures in order to query those in context of SQL. A key JSON function in context of Oracle 12c is json_table() [https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246] combined with un-nesting.

Example Data Set

DROP TABLE reseller;
CREATE TABLE reseller
  (
    reseller_id   NUMBER NOT NULL,
    reseller_name VARCHAR(255),
    suppliers CLOB CONSTRAINT suppliers_ensure_json 
    CHECK (suppliers IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO reseller VALUES
(1, 'abc corp.',
  '{ "suppliers": [
    {"supplier_name": "Sign Corp", "parts": [
      {"partNo": 1, "price": 100, "quantity": 2},
      {"partNo": 2, "price": 200, "quantity": 4}]},
    {"supplier_name": "Stand Corp.", "parts": [
      {"partNo": 10, "price": 1000, "quantity": 6},
      {"partNo": 11, "price": 1100, "quantity": 8},
      {"partNo": 12, "price": 1200, "quantity": 10}]}]}');
INSERT INTO reseller VALUES
(2, 'xyz corp.',
  '{ "suppliers": [
    {"supplier_name": "Sign Corp", "parts": [
      {"partNo": 1, "price": 100, "quantity": 12},
      {"partNo": 3, "price": 300, "quantity": 14}]},
    {"supplier_name": "Stand Corp.", "parts": [
      {"partNo": 10, "price": 1000, "quantity": 16},
      {"partNo": 11, "price": 1100, "quantity": 18},
      {"partNo": 12, "price": 1200, "quantity": 20},
      {"partNo": 13, "price": 1300, "quantity": 22},
      {"partNo": 14, "price": 1400, "quantity": 24}]}]}');

Simply Query: Listing all Suppliers

The first query is going to list the distinct set of supplier names. Supplier names are part of objects inside arrays. Therefore the query will reach into the tree and iterate over the suppliers (using the [*] notation that says to iterate over all array elements) in order to pick out the supplier name (accomplished by the PATH expression). A column is declared that captures all supplier names and a distinct projection ensures that each supplier name is included only once into the result set.

SELECT DISTINCT s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name')) AS s;

Result:
1
Changing [*] to [1] only retrieves the first supplier and since the first supplier is the same across the data set of JSON documents only one supplier name appears in the result set.

SELECT DISTINCT s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[1]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name')) AS s;

Result:
2

Deeper Reach: Listing all Parts and Prices

The next query lists all parts and their prices. It creates two columns, one for the part number and one for the price. However, since each supplier can supply more than one part, the query has to iterate over two arrays (using the [*] notation), once over suppliers, and once over all parts within each supplier. The result is ordered in order to provide a consistent layout. 

SELECT DISTINCT s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*].parts[*]'
                COLUMNS (part_no NUMBER
                         PATH '$.partNo',
                         part_price NUMBER
                         PATH '$.price')) AS s
ORDER BY s.part_no;

Result:
3
It would be more interesting to see the supplier name also for each of the parts. This is where un-nesting is applied.

Un-nesting: Listing all Parts and Prices for each Supplier and for each Reseller

The next query lists the parts and their prices for each supplier, and for each reseller. This allows to see the reseller, the suppliers they use, as well as the parts that are supplied.

SELECT r.reseller_name,
       s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price'))) AS s
ORDER BY r.reseller_name,
         s.supplier_name,
         s.part_no;

Result:
4
Note the keyword NESTED that transforms a hierarchical relationship into a relational representation by repeating the column value on the higher level. In this example, for each part_no and part_price, the supplier_name is provided.

Computation on Un-nested JSON Data

The following query shows the inventory and inventory value for each reseller and supplier. The value is dynamically computed by the query.

SELECT r.reseller_name,
       s.*,
       s.part_price * s.part_quantity AS inventory_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER
                                  PATH '$.quantity'))) AS s
ORDER BY r.reseller_name,
         s.supplier_name,
         s.part_no;

Result:
5

Aggregation on Un-nested JSON Data

The following query shows the total inventory of each reseller by aggregation in the query.

SELECT r.reseller_name,
       SUM(s.part_price * s.part_quantity) AS inventory_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER
                                  PATH '$.quantity'))) AS s
GROUP BY r.reseller_name
ORDER BY r.reseller_name;

Result:
6

Changing to Seller Perspective

The following query shows the quantity and sales of each type of part to resellers. The perspective changes from a reseller perspective so far to a seller perspective. In SQL this is straight-forward, however, in context of nested JSON structures this would mean processing of the 1:n relationship in the opposite direction.

SELECT s.supplier_name,
       r.reseller_name,
       s.part_no,
       s.part_price,
       s.part_quantity,
       s.part_price * s.part_quantity AS sales_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER 
                                  PATH '$.quantity'))) AS s
ORDER BY s.supplier_name,
         r.reseller_name,
         s.part_no;

Result:
7

Aggregation of Seller Data

And, finally, this query shows the total sales by each supplier to each reseller.

SELECT s.supplier_name,
       r.reseller_name,
       SUM(s.part_price * s.part_quantity) AS sales_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER
                                  PATH '$.quantity'))) AS s
GROUP BY s.supplier_name,
         r.reseller_name
ORDER BY s.supplier_name;

Result:
8

Summary

As shown, the JSON function json_table() supports transforming hierarchically structured data into relationally structured data so that it is possible to execute expressive SQL queries, not only retrieving data, but also aggregating data on different levels of the hierarchy.

Once the hierarchical structure is un-nested, the data can be queried from different perspectives, aka, the 1:n relationship in the nested structure can be processed in both directions.

This is another example of the quite powerful and convenient integration of JSON into the world of relational databases and SQL. From an application development perspective this means that JSON documents can be used and at the same time made accessible to the full power of SQL.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.