Oracle 12c – SQL for JSON (Part 7): Creating JSON from Relational Data

Up to now the examples of previous blogs inserted JSON data into JSON enabled columns of tables. This blog shows how JSON data can be created or derived automatically, and, more importantly, how relational data can be transformed and represented automatically as hierarchical JSON data.

Virtual Columns

A virtual column of a table in Oracle is defined by referring to a function invocation. The value of the column is the function invocation result. Parameters (column values of the same table) can be passed to the function and so the functionality of the function can be dependent on the data content of the row.

A simple example is shown in the following. A virtual column is created that contains a name – value pair based on a non-virtual column of the same table (the function is string concatenation).

DROP TABLE vc;
CREATE TABLE vc
  (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS (
     '"name"' || ':' || '"' || name || '"' ));
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
SELECT * FROM vc;

1

Value Column with JSON Structure

The following example shows a similar example, however, in this case the virtual column is a JSON column and the contents is a JSON object containing the name – value pair. The table definition ensures that the column is a JSON column and that it can be accessed with the JSON functions.

DROP TABLE vc;
CREATE TABLE vc
 (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS (
     '{' || '"name"' || ':' || '"' || name || '"' || '}') 
     virtual CONSTRAINT value_ensure_json 
     CHECK (value IS JSON (STRICT WITH UNIQUE KEYS)) );
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
SELECT * FROM vc;

2

Separate Function creating JSON Structure

Functions can be created separately as first class citizens in the database (for the purpose of proper engineering practices as well as functionality reuse); they do not necessarily have to be inline invocations. The following example creates a separate function that creates a JSON object containing a name – value pair.

CREATE OR REPLACE FUNCTION 
  to_JSON(in_name IN VARCHAR2,
          in_value IN VARCHAR2)
  RETURN VARCHAR2 deterministic
  IS
    JSON_value VARCHAR(255);
  BEGIN
    JSON_value := 
      '{' || '"' || in_name || '"' || ':' || 
             '"' || in_value || '"' || '}';
  RETURN JSON_value;
END;
DROP TABLE vc;
CREATE TABLE vc
 (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS (
     to_JSON('name', name)) 
     virtual CONSTRAINT value_ensure_json 
     CHECK (value IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
SELECT * FROM vc;

3

Function with Selection

Functions have the full range of SQL available and can execute queries in their implementation. The following example queries a second table and creates a JSON object from the result. Note that the query depends on the input parameter. 

CREATE OR REPLACE FUNCTION 
  from_dep(in_value IN VARCHAR2)
  RETURN VARCHAR2 deterministic
IS
  JSON_value VARCHAR(255);
  dep_value  VARCHAR(255);
BEGIN
  EXECUTE IMMEDIATE 'SELECT ' || ' adr ' || 
                    ' FROM ' || ' dep ' 
                    INTO dep_value;
  JSON_value := '{' || '"' || in_value || '"' || ':' || 
                       '"' || dep_value || '"' || '}';
  RETURN JSON_value;
END;
DROP TABLE vc;
CREATE TABLE vc
 (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS 
     (from_dep(name)) 
     virtual CONSTRAINT value_ensure_json 
     CHECK (value IS JSON (STRICT WITH UNIQUE KEYS)) );
DROP TABLE dep;
CREATE TABLE dep
 (
    adr VARCHAR(255));
INSERT INTO dep VALUES ('yahoo');
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
SELECT * FROM vc;

4

Hierarchical JSON Column

The last example shows how a function can create complex hierarchical JSON structures. This allows for denormalizing a relational model into a JSON document representation as needed. This is the use case where a client requires a self-contained JSON object that contains all necessary data so that the client does not have to perform repeated queries to obtain all data.

CREATE OR REPLACE FUNCTION 
  from_dep(v_id IN NUMBER)
  RETURN VARCHAR2 deterministic
IS
  JSON_value VARCHAR(255);
  dep_value  VARCHAR(255);
  CURSOR dep_cur
    IS
      SELECT * FROM dep d WHERE d.vc = v_id;
  l_dep_adr dep%ROWTYPE;
  is_first NUMBER;
BEGIN
  is_first := 0;
  JSON_value := '[';
  OPEN dep_cur;
  LOOP
    FETCH dep_cur INTO l_dep_adr;
    EXIT
      WHEN dep_cur%NOTFOUND;
    IF (is_first = 0) 
    THEN
      JSON_value := JSON_value || 
        '{' || '"' || 'adr' || '"' || ':' || 
               '"' || l_dep_adr.adr || '"' || '}';
      is_first := 1;
    ELSE
      JSON_value := JSON_value || ',' || 
        '{' || '"' || 'adr' || '"' || ':' || 
               '"' || l_dep_adr.adr || '"' || '}';
    END IF;
  END LOOP;
  CLOSE dep_cur;
  JSON_value := JSON_value || ']';
  RETURN JSON_value;
END;
DROP TABLE vc;
CREATE TABLE vc
 (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS 
     (from_dep(id)) 
     virtual CONSTRAINT value_ensure_json 
     CHECK (value IS JSON (STRICT WITH UNIQUE KEYS)) );
DROP TABLE dep;
CREATE TABLE dep
 (
   vc NUMBER, adr VARCHAR(255));
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
INSERT INTO dep VALUES (1, 'yahoo');
INSERT INTO dep VALUES (2, 'yahaa');
INSERT INTO dep VALUES (2, 'yahee');
SELECT * FROM vc;

5

Summary

Virtual columns are an interesting mechanism when it comes to denormalizing a relational data model into a hierarchical JSON document model. This blog gave some examples upon which more complex functionality can be built. It shows that Oracle 12c can not only query JSON, but also generate JSON structures from relational data supporting a large array of use cases.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Advertisement

Oracle 12c – SQL for JSON (Part 6): Un-nesting

JSON supports nested structures and the particular shape of the nesting is decided by the application systems or by a data modeler for different reasons. How can nested JSON structures be queried with SQL?

Nested JSON Structures

Nested JSON structures are based on two types: JSON objects and JSON arrays. Objects can have several members; each member is either of scalar type, object type or array type. An array can contain scalar values, objects or arrays. This nesting of objects with objects and arrays provides the mechanism to build (theoretically unlimited deep) nested structures.

Nested structures are used in most cases to model a part-of relationship or an association relationship. As the nested structures are trees, there is a 1:n relationship between the higher and the lower nodes (without cycles as JSON types do not include reference types).

Folklore says that nesting is sufficient to ‘avoid’ joins in context of JSON databases, but professionals know a lot better.

In the following, an example data set is introduced and a few queries afterwards that specifically deal with nested structures in order to query those in context of SQL. A key JSON function in context of Oracle 12c is json_table() [https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246] combined with un-nesting.

Example Data Set

DROP TABLE reseller;
CREATE TABLE reseller
  (
    reseller_id   NUMBER NOT NULL,
    reseller_name VARCHAR(255),
    suppliers CLOB CONSTRAINT suppliers_ensure_json 
    CHECK (suppliers IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO reseller VALUES
(1, 'abc corp.',
  '{ "suppliers": [
    {"supplier_name": "Sign Corp", "parts": [
      {"partNo": 1, "price": 100, "quantity": 2},
      {"partNo": 2, "price": 200, "quantity": 4}]},
    {"supplier_name": "Stand Corp.", "parts": [
      {"partNo": 10, "price": 1000, "quantity": 6},
      {"partNo": 11, "price": 1100, "quantity": 8},
      {"partNo": 12, "price": 1200, "quantity": 10}]}]}');
INSERT INTO reseller VALUES
(2, 'xyz corp.',
  '{ "suppliers": [
    {"supplier_name": "Sign Corp", "parts": [
      {"partNo": 1, "price": 100, "quantity": 12},
      {"partNo": 3, "price": 300, "quantity": 14}]},
    {"supplier_name": "Stand Corp.", "parts": [
      {"partNo": 10, "price": 1000, "quantity": 16},
      {"partNo": 11, "price": 1100, "quantity": 18},
      {"partNo": 12, "price": 1200, "quantity": 20},
      {"partNo": 13, "price": 1300, "quantity": 22},
      {"partNo": 14, "price": 1400, "quantity": 24}]}]}');

Simply Query: Listing all Suppliers

The first query is going to list the distinct set of supplier names. Supplier names are part of objects inside arrays. Therefore the query will reach into the tree and iterate over the suppliers (using the [*] notation that says to iterate over all array elements) in order to pick out the supplier name (accomplished by the PATH expression). A column is declared that captures all supplier names and a distinct projection ensures that each supplier name is included only once into the result set.

SELECT DISTINCT s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name')) AS s;

Result:
1
Changing [*] to [1] only retrieves the first supplier and since the first supplier is the same across the data set of JSON documents only one supplier name appears in the result set.

SELECT DISTINCT s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[1]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name')) AS s;

Result:
2

Deeper Reach: Listing all Parts and Prices

The next query lists all parts and their prices. It creates two columns, one for the part number and one for the price. However, since each supplier can supply more than one part, the query has to iterate over two arrays (using the [*] notation), once over suppliers, and once over all parts within each supplier. The result is ordered in order to provide a consistent layout. 

SELECT DISTINCT s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*].parts[*]'
                COLUMNS (part_no NUMBER
                         PATH '$.partNo',
                         part_price NUMBER
                         PATH '$.price')) AS s
ORDER BY s.part_no;

Result:
3
It would be more interesting to see the supplier name also for each of the parts. This is where un-nesting is applied.

Un-nesting: Listing all Parts and Prices for each Supplier and for each Reseller

The next query lists the parts and their prices for each supplier, and for each reseller. This allows to see the reseller, the suppliers they use, as well as the parts that are supplied.

SELECT r.reseller_name,
       s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price'))) AS s
ORDER BY r.reseller_name,
         s.supplier_name,
         s.part_no;

Result:
4
Note the keyword NESTED that transforms a hierarchical relationship into a relational representation by repeating the column value on the higher level. In this example, for each part_no and part_price, the supplier_name is provided.

Computation on Un-nested JSON Data

The following query shows the inventory and inventory value for each reseller and supplier. The value is dynamically computed by the query.

SELECT r.reseller_name,
       s.*,
       s.part_price * s.part_quantity AS inventory_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER
                                  PATH '$.quantity'))) AS s
ORDER BY r.reseller_name,
         s.supplier_name,
         s.part_no;

Result:
5

Aggregation on Un-nested JSON Data

The following query shows the total inventory of each reseller by aggregation in the query.

SELECT r.reseller_name,
       SUM(s.part_price * s.part_quantity) AS inventory_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER
                                  PATH '$.quantity'))) AS s
GROUP BY r.reseller_name
ORDER BY r.reseller_name;

Result:
6

Changing to Seller Perspective

The following query shows the quantity and sales of each type of part to resellers. The perspective changes from a reseller perspective so far to a seller perspective. In SQL this is straight-forward, however, in context of nested JSON structures this would mean processing of the 1:n relationship in the opposite direction.

SELECT s.supplier_name,
       r.reseller_name,
       s.part_no,
       s.part_price,
       s.part_quantity,
       s.part_price * s.part_quantity AS sales_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER 
                                  PATH '$.quantity'))) AS s
ORDER BY s.supplier_name,
         r.reseller_name,
         s.part_no;

Result:
7

Aggregation of Seller Data

And, finally, this query shows the total sales by each supplier to each reseller.

SELECT s.supplier_name,
       r.reseller_name,
       SUM(s.part_price * s.part_quantity) AS sales_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER
                                  PATH '$.quantity'))) AS s
GROUP BY s.supplier_name,
         r.reseller_name
ORDER BY s.supplier_name;

Result:
8

Summary

As shown, the JSON function json_table() supports transforming hierarchically structured data into relationally structured data so that it is possible to execute expressive SQL queries, not only retrieving data, but also aggregating data on different levels of the hierarchy.

Once the hierarchical structure is un-nested, the data can be queried from different perspectives, aka, the 1:n relationship in the nested structure can be processed in both directions.

This is another example of the quite powerful and convenient integration of JSON into the world of relational databases and SQL. From an application development perspective this means that JSON documents can be used and at the same time made accessible to the full power of SQL.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 5): Property Existence

JSON structures do not have to follow a global schema in Oracle 12c; each JSON structure in a JSON column can have its own schema, with varying properties. This blog discusses the function JSON_EXISTS() that checks the existence of a property in a JSON structure and its application in SQL queries.

Sample Data Set

A sample data set for this blog is as follows. Note that not all JSON structures follow the same schema.

DROP TABLE people;
CREATE TABLE people
  (
    id NUMBER,
    person CLOB CONSTRAINT person_ensure_json 
    CHECK (person IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO people VALUES
(1, '{ "name": "Bob", "city": "SF"}');
INSERT INTO people VALUES
( 2, '{ "name": "Jake", "city": "PA"}');
INSERT INTO people VALUES
( 3, '{ "name": "Alice", "city": "NYC"}');
INSERT INTO people VALUES
( 4, '{ "name": "Jenn", "city": "Tokyo"}');
INSERT INTO people VALUES
( 5, '{ "name": "Frank", "city": "Berlin"}');
INSERT INTO people VALUES
( 6, '{ "name": "Jup", "city": "Paris"}');
INSERT INTO people VALUES
( 7, '{ "name": "Jib"}');
DROP TABLE cities;
CREATE TABLE cities
  (
    id NUMBER,
    city CLOB CONSTRAINT city_ensure_json 
    CHECK (city IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO cities VALUES
(101, '{"city": {"name": "SF", "population": 837442, 
                 "area": 121.4}, 
        "state": "CA", "country": "US"}');
INSERT INTO cities VALUES
(102, '{"city": {"name": "PA", "population": 64403, 
                 "area": 66.787}, 
        "state": "CA", "country": "US"}');
INSERT INTO cities VALUES
(103, '{"city": {"name": "NYC", "population": 8405837, 
                 "area": 783.8}, 
        "state": "NY", "country": "US"}');
INSERT INTO cities VALUES
(104, '{"city": {"name": "Tokyo", "population": 9071577, 
                 "area": 622.99}, 
        "state": null, "country": "Japan"}');
INSERT INTO cities VALUES
(105, '{"city": {"name": "Paris", "population": 10869000, 
                 "area": 2845}, 
        "state": null, "country": "France"}');
INSERT INTO cities VALUES
(106, '{"city": {"name": "Berlin", "population": 3517424, 
                 "area": 891.75}, 
        "state": null, "country": "Germany"}');
INSERT INTO cities VALUES
(107, '{"city": {"name": "Asilomar", "population": 100, 
                 "area": 2}}');

Absent JSON Property

The default SQL query behavior for absent JSON properties is discussed briefly first in order to establish the base semantics. 

  • Projection
    • Absent JSON properties are returned as SQL NULL
    • SELECT DISTINCT c.city.country FROM cities c;
    • Absent properties can therefore be checked for by using SQL NULL in a predicate
    • SELECT DISTINCT c.city.city.name, c.city.country FROM cities c WHERE c.city.country IS NULL;
  • Selection
    • Absent JSON properties do not match in predicates
    • SELECT DISTINCT c.city.city.name FROM cities c WHERE c.city.country = ‘US’;
  • Cartesian Product (join)
    • Absent JSON properties do not contribute to the cartesian product
    • SELECT cl.city.city.name, cr.city.city.name FROM cities cl, cities cr WHERE cl.city.country = cr.city.country;

The base semantics of absent JSON properties is consistent with their representation as SQL NULL.

Reasons for Absent JSON Properties

Why would JSON properties be present in some JSON structures of a JSON column, and absent from others? There are many possible legitimate reasons, some are discussed next:

  • Life Cycle Implementations (Application Design)
    • The life cycle of a concept (e.g., person) is implemented by adding property values over time. Properties are stored corresponding to the life cycle. For example, initially a person is added by name, with city and other information added later as they become known.
  • Schema Evolution
    • A property that was not required in the past (from a data model perspective) is added due to data model changes. Adding a new property to the already existing JSON structures might or might not be done and if not, different JSON documents have a different structure.
  • Errors
    • An error in the application system omits writing a property that should have been written. A real bug.
  • Intentional Design Decision
    • An application system was designed to only write properties that have values and to omit properties that do not have a value (yet). A corresponding design for deletion is that if a value of a property is deleted, the whole property is removed instead of being set to e.g. JSON null.

There is a design philosophy that emphasizes that even through JSON supports varying structures, they ideally should follow the same schema. This would mean that all JSON structures in a JSON column have the exact same set of properties. However, another design philosophy says that the dynamic nature of JSON structures should be taken advantage of avoiding the classical schema migration problem: a JSON structure should only contain those properties that make sense at this state of the structure life cycle.

Checking for Absent JSON Properties: JSON_EXISTS()

Since each JSON structure in a JSON column can have different schema it might be important to explicitly check for properties in SQL queries for query reliability and also documentation reasons.

Oracle 12c implements the function JSON_EXISTS() [https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246] that allows to check for existence of a property. For example:

SELECT DISTINCT c.city.city.name AS name 
FROM cities c 
WHERE json_exists(c.city, '$.country');

1

This query only selects those city names of cities that have their country specified.

More queries using JSON_EXISTS() follow:

CREATE OR REPLACE VIEW all_cities 
AS 
SELECT DISTINCT c.city.city.name AS name 
FROM cities c 
WHERE json_exists(c.city, '$.country');
SELECT p.person 
FROM people p, all_cities all_c 
WHERE p.person.city = all_c.name 
  AND json_exists(p.person, '$.city');

2

Details on JSON_EXISTS()

JSON_EXISTS() supports error handling clauses. For example, when a query accesses a JSON column with incorrect JSON structures, a choice is to ignore rows with incorrect JSON structures, include them into the result, or throw an error when processing the query. An example follows:

DROP TABLE incorrect_json;
CREATE TABLE incorrect_json
  ( id NUMBER, data CLOB);
INSERT INTO incorrect_json VALUES
(1, '{ "name": "item1", "value": "value1"}');
INSERT INTO incorrect_json VALUES
(2, '{ "name": "item2", "value": "value2_broken_json}');
SELECT * 
FROM incorrect_json ij 
WHERE json_exists(ij.data, '$.value' false ON error);

3

SELECT * 
FROM incorrect_json ij 
WHERE json_exists(ij.data, '$.value' true ON error);

4

SELECT * 
FROM incorrect_json ij 
WHERE json_exists(ij.data, '$.value' error ON error);

5

JSON_EXISTS() expects a column that can store json structures as strings; when applied to a non-string column, an error indicates this:

SELECT * 
FROM incorrect_json ij 
WHERE json_exists(ij.id, '$.missing_property');

6

When applied to a non-JSON column, the error message looks different:

DROP TABLE incorrect_json2;
CREATE TABLE incorrect_json2
  ( id VARCHAR(10), data CLOB);
INSERT INTO incorrect_json2 VALUES
('one', '{ "name": "item1", "value": "value1"}');
INSERT INTO incorrect_json2 VALUES
('two', '{ "name": "item2", "value": "value2_broken_json}');
SELECT * 
FROM incorrect_json2 ij 
WHERE json_exists(ij.id, '$' error ON error);

7

SELECT * 
FROM incorrect_json2 ij 
WHERE json_exists(ij.id, '$' false ON error);

8

SELECT * 
FROM incorrect_json2 ij 
WHERE json_exists(ij.id, '$' true ON error);

9

Summary

Varying schema in JSON structures can be explicitly explored using the JSON_EXISTS() function and therefore is a way to incorporate schema variation into SQL queries. This makes a powerful combination.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 4): Basic Views

With JSON support available in Oracle 12c, it is possible to define views on JSON data.

Background

Complex JSON documents do not have to be queried completely every time, and views can provide an easy way to define an important subset of the JSON documents’ content. The relational database management system Oracle 12c implements the view mechanism for SQL and that is available for JSON columns as well.

In the following a few JSON document view examples are discussed on and implemented. Views are defined as well as accessed, directly queried as well as part of other SQL statements.

Example Data Set

First, an example data set is introduced around people living in cities and data about these cities.

DROP TABLE people;
CREATE TABLE people
  (
    id NUMBER,
    person CLOB CONSTRAINT person_ensure_json 
    CHECK (person IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO people VALUES
( 1, '{ "name": "Bob", "city": "SF"}');
INSERT INTO people VALUES
( 2, '{ "name": "Jake", "city": "PA"}');
INSERT INTO people VALUES
( 3, '{ "name": "Alice", "city": "NYC"}');
INSERT INTO people VALUES
( 4, '{ "name": "Jenn", "city": "Tokyo"}');
INSERT INTO people VALUES
( 5, '{ "name": "Frank", "city": "Berlin"}');
INSERT INTO people VALUES
( 6, '{ "name": "Jup", "city": "Paris"}');
DROP TABLE cities;
CREATE TABLE cities
  (
    id NUMBER,
    city CLOB CONSTRAINT city_ensure_json 
    CHECK (city IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO cities VALUES
(101, '{"city": {"name": "SF", "population": 837442, 
                 "area": 121.4}, 
        "state": "CA", "country": "US"}');
INSERT INTO cities VALUES
(102, '{"city": {"name": "PA", "population": 64403, 
                 "area": 66.787}, 
        "state": "CA", "country": "US"}');
INSERT INTO cities VALUES
(103, '{"city": {"name": "NYC", "population": 8405837, 
                 "area": 783.8}, 
        "state": "NY", "country": "US"}');
INSERT INTO cities VALUES
(104, '{"city": {"name": "Tokyo", "population": 9071577, 
                 "area": 622.99}, 
        "state": null, "country": "Japan"}');
INSERT INTO cities VALUES
(105, '{"city": {"name": "Paris", "population": 10869000, 
                 "area": 2845}, 
        "state": null, "country": "France"}');
INSERT INTO cities VALUES
(106, '{"city": {"name": "Berlin", "population": 3517424, 
                 "area": 891.75}, 
        "state": null, "country": "Germany"}');

Views

A few example views and possible uses are discussed in the following. The results of the queries are displayed in graphical representation provided by SQLDeveloper.

Simple View

CREATE OR REPLACE VIEW all_cities
AS
SELECT DISTINCT c.city.city.name 
FROM cities c;
SELECT * FROM all_cities;

1

SELECT p.person
FROM people p, all_cities all_c
WHERE p.person.city = all_c.city;

 2

SELECT p.person.name, all_c.city
FROM people p, all_cities all_c
WHERE p.person.city = all_c.city;

3

View Definition with Predicate

CREATE OR REPLACE VIEW us_cities
AS
SELECT DISTINCT c.city.city.name 
FROM cities c 
WHERE c.city.country = 'US';
SELECT * FROM us_cities;

 4

SELECT p.person.name, us_c.city
FROM people p, us_cities us_c
WHERE p.person.city = us_c.city;

5

View Definition with Computation

CREATE OR REPLACE VIEW city_stats
AS
SELECT 
  c.city.city.name 
    AS name, 
  ROUND(c.city.city.population / c.city.city.area, 2) 
    AS population_density
FROM cities c;
SELECT * FROM city_stats;

 6

SELECT p.person.name, c_s.name, c_s.population_density
FROM people p, city_stats c_s
WHERE p.person.city = c_s.name
ORDER BY c_s.population_density DESC;

7

Summary

The benefit of having JSON support in a relational database management system is that mechanisms and operations beyond querying are available with views being one. It is clear that the ability to define views is important, from a data model and data semantics perspective as well as from a data security and developer convenience viewpoint.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

NoSQL: No-Tation

This is too good to pass as there is a ton of truth in this simple presentation below. It is cited from http://java.dzone.com/articles/history-databases-”no-tation” without modification:

NoSQL_No_Tation

What Flavor of SQL?

The question now is: what flavor of SQL is the SQL in “No, SQL!”?

One alternative is to start from scratch and invent a SQL grammar and semantics that is not based on the existing SQL standard and SQL implementations. That would not be my choice at all, for the obvious reasons.

Another alternative is to treat JSON ‘just’ as a data structure and embed it into existing SQL. This would be roughly the equivalent of the object/relational model (~NF2) with the variation that there is no global schema and some non-standard SQL types.

The (in my opinion) better alternative is to look at JSON as more than just a data structure and combine JSON with the schema-per-document paradigm as the underlying philosophy.

  • This would be a real game changer as it would cater to the dynamic changing JSON data structures as needed in many projects.
  • As a result, the schema-per-document model would be integrated with the relational model so that a single database management system can support several data models and their paradigms concurrently.

Extending SQL Semantics

Supporting a schema-per-document philosophy requires extending the SQL semantics and its operators to be able to deal with the fact that different JSON structures might have a different schema, even if they are in the same collection or relational table.

Operators like typeOf() come to mind that can test if a path into a JSON structure refers to a specific JSON type or JSON literal name. Another operator is exists() to test if a path into a JSON structure exists. Or array operators that can determine if an array contains a specific JSON structure as an element. containsPath() is an operator that checks if a JSON structure has the path given.

Many more operators can be envisioned to make managing and querying JSON structures in context of SQL expressive and useful.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

 

SQL: The Kingmaker of NoSQL Databases?

Here is an interesting question: what if all NoSQL and NewSQL databases would implement SQL?

Advanced SQL

“SQL” in this context does not refer to SQL 92 (http://en.wikipedia.org/wiki/SQL-92) or other versions of the purely relational model focused standard, but to a SQL that is extended properly in syntax and in semantics to deal with non-relational data models. Let’s name it “Advanced SQL” for the purpose of this blog.

I furthermore assume that all data in a database would be accessible, not just parts of it. For example, in key/value stores the values’ internal structure would be accessible as long as it contains data accessible in a structured way (e.g., a JSON representation).

Ideally, of course, all NoSQL and NewSQL databases would implement the same new form of Advanced SQL, but for this blog this is not really necessary or a precondition (but it would be nice, though).

If such an Advanced SQL would be available on all NoSQL and NewSQL databases, then

  • the typical SQL operators like select, project, join, and so on are available on all data in a database across all data models like relational, document-oriented, key/value, columnar, graph-oriented.
  • the same data set loaded into each NoSQL or NewSQL database would be accessible through Advanced SQL and the same query would return the same result, no matter which database (best case assumption wrt. SQL execution semantics).

Data Models

In order to be able to load all data sets into all types of databases the data models have to be equivalent in their structural expressiveness. This might not necessarily be the case today, however, the common trend appears to be that databases are either supporting JSON, or structured data containing array types and nested sub-structures alongside elementary types like string and integer.

For the sake of the blog, let’s assume the data models are equivalent in structural expressiveness.

Implication of Advanced SQL on NoSQL / NewSQL Databases

The biggest implication of Advanced SQL on NoSQL / NewSQL databases would be that all NoSQL / NewSQL databases would be equivalent from a functional perspective in terms of accessing data: a given data set could be stored in any of the databases and queried using SQL. A query would (ideally) return the same result in any of the databases storing the same data set.

The big question this bears is: what would distinguish the NoSQL/NewSQL databases in this case?

  • The non-functional aspects.

For example, to name a few:

  • Performance: the same query would take different execution time
  • Transactions: a set of queries would form a single transaction (or not)
  • Scale: the speed and extent a database can scale up/scale down with varying data set size
  • Backup/restore: how fast and how reliable can databases be re-created after disaster

This basically would mean that a user can select a database based on its non-functional characteristics. (Of course, history has shown that the particular SQL implementation of a database will have proprietary extensions also).

Benefits of Advanced SQL

There are several benefits if Advanced SQL like described above would be available.

  • A user can select not only one, but several databases for different needs, while not having to deal with vastly different query interfaces. For example, one for production system access speed and a different one for long-term storage or analytics.
  • Data can be distributed across different systems depending of how the data is used without having to deal with incompatible access interfaces.
  • NoSQL and NewSQL databases can be integrated alongside existing RDBMSs for mutual benefit.

In a nutshell, users are not forced anymore to select one databases for all their needs, but can select the appropriate one for the particular part of their system architecture.

Trending of Non-functional Features

There is an interesting trend ongoing. OldSQL and NewSQL databases start supporting JSON (e.g., Postgres, VoltDB); Informix has a compatible driver interface to the MongoDB wire format; Tokutek and GridGain implement non-functional replacements for MongoDB; Oracle NoSQL supports per-document Avro schemas for the values; Tokutek adds transaction support; and so on. This looks like the early stages of ‘sorting out’ on a data model level as well as non-functional level. Time will tell if this was the starting point of consolidation in the NoSQL/NewSQL space.

King Making

So why, if the differentiation is likely to happen on the non-functional level, is SQL a potential kingmaker? In my opinion, Advanced SQL will be the standard expectation and a database not supporting Advanced SQL will be relegated to being a niche system. So while the presence of Advanced SQL will be a distinguishing factor (possibly a small one), the absence of Advanced SQL will impact the success of a database significantly.