Oracle 12c – In-Memory Option (Part 1)

The name ‘In-Memory Option’ does not mean Oracle turned into a general main memory database or has a new option to do so; it means something else entirely.

Aggregation

Columnar data representation is geared towards aggregation of columns in relational databases. In order to make aggregation fast, the rows of a column should be ‘near’ to each other in terms of memory layout; the best would be if the rows of the column would be next to each other. Access for aggregation is very efficient with this layout. A database implementing this approach is called a columnar database (http://en.wikipedia.org/wiki/Column-oriented_DBMS).

Specialized databases exist that structure the data on disk as well as in memory in a columnar format. Once the data is in such a database, aggregation can be fast. However, the data has to be in the database and chances are that the same data is in a separate OLTP database also (a second copy). This then means that a data integration problem has to be solved in order to ensure that data that is managed in the OLTP database is also available (with sufficiently minor delay) in the columnar database (or columnar store). And, since it changes in the OLTP database, the integration effort is an ongoing process.

In-Memory Option

Oracle 12c’s In-Memory Option [http://www.oracle.com/us/corporate/features/database-in-memory-option/] combines the columnar storage with OLTP storage in the same database. The columnar storage layout is only implemented in main memory, leaving the disk layout as it is. This means that OLTP and columnar access is possible in the same database, concurrently, and the data is always 100% transactionally consistent, as no delayed data integration has to take place. The ‘artificial’ separation of database functionality into separate databases to satisfy aggregation processing requirements is not necessary anymore in such a database management system.

Example

Since Oracle 12c combines the relational and the columnar representation, using the columnar functionality is fully declarative without any operational management overhead.

In the following a small example shows how to make use of the Oracle 12c In-Memory Option concurrently to the regular OLTP load.

Assume a regular table that has been in place for a while and is accessed by a regular application (OLTP use case). This table manages data and is changed according to the application logic.

DROP TABLE im_players;
CREATE TABLE im_players
  (
   player_id NUMBER,
   player_name VARCHAR2(128),
   player_score NUMBER);

Here is a way to populate the table with random content for testing purposes:

DECLARE
  player_id NUMBER;
  player_score NUMBER;
  player_name VARCHAR(128);
BEGIN
  player_id := 0;
  LOOP
    player_id := player_id + 1;
    player_name := 'player_' || player_id;
    player_score := player_id * 10;
    INSERT INTO im_players VALUES
      (player_id, player_name, player_score);
    player_score := player_id * 100;
    INSERT INTO im_players VALUES
      (player_id, player_name, player_score);
    player_score := player_id * 1000;
    INSERT INTO im_players VALUES
      (player_id, player_name, player_score);
  EXIT
    WHEN player_id = 1000000;
  END LOOP;
  COMMIT;
  RETURN;
END;

A new requirement has to be fulfilled based on business needs. The requirement is to analyze the existing table regularly while it is being used by the OLTP application. In the following this requirement is implemented with the Oracle 12c In-Memory Option.

First, memory has to be set aside for the columnar representation of the data in main memory:

ALTER system SET inmemory_size = 300M scope=spfile;

Restarting the database enables this change and this query confirms the setting:

SELECT name, value FROM v$sga;

01

Now the database has a main memory area (In-Memory Area) set aside for the columnar representation of relational tables.

Next, the existing table definition is modified to additionally be represented in columnar representation:

ALTER TABLE im_players inmemory priority critical;

There are different options that indicate how critical it is to have the rows in columnar format, how to compress depending on the access model, and actually which columns should be in columnar format as not all columns are necessarily required.

The alter statement above indicates that it is critical for the data to be in columnar format. The database will diligently load the data into the columnar memory section. Compression is enabled for low query access. All columns are enabled, based on the use case.

The following query shows how much data is not yet populated. Loading data is a process that takes time and the query supports checking the load status. Once the loading is completed, zero bytes are not yet populated.

SELECT 
  v.segment_name name,
  v.populate_status populate_status,
  v.bytes_not_populated bytes_not_populated
FROM v$im_segments v
ORDER BY 1;

02

The following query performs an aggregation.

SELECT
  player_name,
  COUNT(*),
  MIN(player_score),
  MAX(player_score),
  AVG(player_score),
  SUM(player_score)
FROM im_players
GROUP BY player_name;

??03

03

The query plan of this query shows that the columnar representation is being accessed.

----------------------------------------------------------------------------------------
| Id | Operation                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |            | 3025K | 227M  | 216 (38)   | 00:00:01 |
|  1 | HASH GROUP BY              |            | 3025K | 227M  | 216 (38)   | 00:00:01 |
|  2 | TABLE ACCESS INMEMORY FULL | IM_PLAYERS | 3025K | 227M  | 141 (5)    | 00:00:01 |
----------------------------------------------------------------------------------------

And this query shows access metrics indicating that the query actually executed against the columnar representation. The values after executing the aggregation query are different from the values before the aggregation.

Note on SQL

An important aspect of the Oracle 12c In-Memory Option is that the SQL specifying the required aggregation is regular SQL and no new or different language constructs had to be introduced. The underlying reason is that the query optimizer has sufficient knowledge to make the appropriate access choices based on the SQL presented and does not require hints from users. This is a huge engineering benefit since no new or different language has to be learned to make use of the columnar representation.

Summary

This was a super-brief introduction into the Oracle 12c In-Memory Option. The documentation has of course a lot more background [http://docs.oracle.com/database/121/ADMIN/memory.htm#ADMIN14257]. In addition, a nice blog series exists that introduces the various aspects step-by-step [https://blogs.oracle.com/In-Memory/entry/getting_started_with_oracle_database].

From a data management aspect an optimized representation targeted towards aggregation is enabled declaratively in context of the same database that holds the OLTP data. This is a huge benefit as one database supports the mixed workload and any data integration setup between different databases becomes unnecessary. With significantly increasing main memory sizes this option allows freeing up engineering cycles as well as operational resources while increasing data consistency and processing performance.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 7): Creating JSON from Relational Data

Up to now the examples of previous blogs inserted JSON data into JSON enabled columns of tables. This blog shows how JSON data can be created or derived automatically, and, more importantly, how relational data can be transformed and represented automatically as hierarchical JSON data.

Virtual Columns

A virtual column of a table in Oracle is defined by referring to a function invocation. The value of the column is the function invocation result. Parameters (column values of the same table) can be passed to the function and so the functionality of the function can be dependent on the data content of the row.

A simple example is shown in the following. A virtual column is created that contains a name – value pair based on a non-virtual column of the same table (the function is string concatenation).

DROP TABLE vc;
CREATE TABLE vc
  (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS (
     '"name"' || ':' || '"' || name || '"' ));
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
SELECT * FROM vc;

1

Value Column with JSON Structure

The following example shows a similar example, however, in this case the virtual column is a JSON column and the contents is a JSON object containing the name – value pair. The table definition ensures that the column is a JSON column and that it can be accessed with the JSON functions.

DROP TABLE vc;
CREATE TABLE vc
 (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS (
     '{' || '"name"' || ':' || '"' || name || '"' || '}') 
     virtual CONSTRAINT value_ensure_json 
     CHECK (value IS JSON (STRICT WITH UNIQUE KEYS)) );
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
SELECT * FROM vc;

2

Separate Function creating JSON Structure

Functions can be created separately as first class citizens in the database (for the purpose of proper engineering practices as well as functionality reuse); they do not necessarily have to be inline invocations. The following example creates a separate function that creates a JSON object containing a name – value pair.

CREATE OR REPLACE FUNCTION 
  to_JSON(in_name IN VARCHAR2,
          in_value IN VARCHAR2)
  RETURN VARCHAR2 deterministic
  IS
    JSON_value VARCHAR(255);
  BEGIN
    JSON_value := 
      '{' || '"' || in_name || '"' || ':' || 
             '"' || in_value || '"' || '}';
  RETURN JSON_value;
END;
DROP TABLE vc;
CREATE TABLE vc
 (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS (
     to_JSON('name', name)) 
     virtual CONSTRAINT value_ensure_json 
     CHECK (value IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
SELECT * FROM vc;

3

Function with Selection

Functions have the full range of SQL available and can execute queries in their implementation. The following example queries a second table and creates a JSON object from the result. Note that the query depends on the input parameter. 

CREATE OR REPLACE FUNCTION 
  from_dep(in_value IN VARCHAR2)
  RETURN VARCHAR2 deterministic
IS
  JSON_value VARCHAR(255);
  dep_value  VARCHAR(255);
BEGIN
  EXECUTE IMMEDIATE 'SELECT ' || ' adr ' || 
                    ' FROM ' || ' dep ' 
                    INTO dep_value;
  JSON_value := '{' || '"' || in_value || '"' || ':' || 
                       '"' || dep_value || '"' || '}';
  RETURN JSON_value;
END;
DROP TABLE vc;
CREATE TABLE vc
 (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS 
     (from_dep(name)) 
     virtual CONSTRAINT value_ensure_json 
     CHECK (value IS JSON (STRICT WITH UNIQUE KEYS)) );
DROP TABLE dep;
CREATE TABLE dep
 (
    adr VARCHAR(255));
INSERT INTO dep VALUES ('yahoo');
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
SELECT * FROM vc;

4

Hierarchical JSON Column

The last example shows how a function can create complex hierarchical JSON structures. This allows for denormalizing a relational model into a JSON document representation as needed. This is the use case where a client requires a self-contained JSON object that contains all necessary data so that the client does not have to perform repeated queries to obtain all data.

CREATE OR REPLACE FUNCTION 
  from_dep(v_id IN NUMBER)
  RETURN VARCHAR2 deterministic
IS
  JSON_value VARCHAR(255);
  dep_value  VARCHAR(255);
  CURSOR dep_cur
    IS
      SELECT * FROM dep d WHERE d.vc = v_id;
  l_dep_adr dep%ROWTYPE;
  is_first NUMBER;
BEGIN
  is_first := 0;
  JSON_value := '[';
  OPEN dep_cur;
  LOOP
    FETCH dep_cur INTO l_dep_adr;
    EXIT
      WHEN dep_cur%NOTFOUND;
    IF (is_first = 0) 
    THEN
      JSON_value := JSON_value || 
        '{' || '"' || 'adr' || '"' || ':' || 
               '"' || l_dep_adr.adr || '"' || '}';
      is_first := 1;
    ELSE
      JSON_value := JSON_value || ',' || 
        '{' || '"' || 'adr' || '"' || ':' || 
               '"' || l_dep_adr.adr || '"' || '}';
    END IF;
  END LOOP;
  CLOSE dep_cur;
  JSON_value := JSON_value || ']';
  RETURN JSON_value;
END;
DROP TABLE vc;
CREATE TABLE vc
 (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS 
     (from_dep(id)) 
     virtual CONSTRAINT value_ensure_json 
     CHECK (value IS JSON (STRICT WITH UNIQUE KEYS)) );
DROP TABLE dep;
CREATE TABLE dep
 (
   vc NUMBER, adr VARCHAR(255));
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
INSERT INTO dep VALUES (1, 'yahoo');
INSERT INTO dep VALUES (2, 'yahaa');
INSERT INTO dep VALUES (2, 'yahee');
SELECT * FROM vc;

5

Summary

Virtual columns are an interesting mechanism when it comes to denormalizing a relational data model into a hierarchical JSON document model. This blog gave some examples upon which more complex functionality can be built. It shows that Oracle 12c can not only query JSON, but also generate JSON structures from relational data supporting a large array of use cases.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 6): Un-nesting

JSON supports nested structures and the particular shape of the nesting is decided by the application systems or by a data modeler for different reasons. How can nested JSON structures be queried with SQL?

Nested JSON Structures

Nested JSON structures are based on two types: JSON objects and JSON arrays. Objects can have several members; each member is either of scalar type, object type or array type. An array can contain scalar values, objects or arrays. This nesting of objects with objects and arrays provides the mechanism to build (theoretically unlimited deep) nested structures.

Nested structures are used in most cases to model a part-of relationship or an association relationship. As the nested structures are trees, there is a 1:n relationship between the higher and the lower nodes (without cycles as JSON types do not include reference types).

Folklore says that nesting is sufficient to ‘avoid’ joins in context of JSON databases, but professionals know a lot better.

In the following, an example data set is introduced and a few queries afterwards that specifically deal with nested structures in order to query those in context of SQL. A key JSON function in context of Oracle 12c is json_table() [https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246] combined with un-nesting.

Example Data Set

DROP TABLE reseller;
CREATE TABLE reseller
  (
    reseller_id   NUMBER NOT NULL,
    reseller_name VARCHAR(255),
    suppliers CLOB CONSTRAINT suppliers_ensure_json 
    CHECK (suppliers IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO reseller VALUES
(1, 'abc corp.',
  '{ "suppliers": [
    {"supplier_name": "Sign Corp", "parts": [
      {"partNo": 1, "price": 100, "quantity": 2},
      {"partNo": 2, "price": 200, "quantity": 4}]},
    {"supplier_name": "Stand Corp.", "parts": [
      {"partNo": 10, "price": 1000, "quantity": 6},
      {"partNo": 11, "price": 1100, "quantity": 8},
      {"partNo": 12, "price": 1200, "quantity": 10}]}]}');
INSERT INTO reseller VALUES
(2, 'xyz corp.',
  '{ "suppliers": [
    {"supplier_name": "Sign Corp", "parts": [
      {"partNo": 1, "price": 100, "quantity": 12},
      {"partNo": 3, "price": 300, "quantity": 14}]},
    {"supplier_name": "Stand Corp.", "parts": [
      {"partNo": 10, "price": 1000, "quantity": 16},
      {"partNo": 11, "price": 1100, "quantity": 18},
      {"partNo": 12, "price": 1200, "quantity": 20},
      {"partNo": 13, "price": 1300, "quantity": 22},
      {"partNo": 14, "price": 1400, "quantity": 24}]}]}');

Simply Query: Listing all Suppliers

The first query is going to list the distinct set of supplier names. Supplier names are part of objects inside arrays. Therefore the query will reach into the tree and iterate over the suppliers (using the [*] notation that says to iterate over all array elements) in order to pick out the supplier name (accomplished by the PATH expression). A column is declared that captures all supplier names and a distinct projection ensures that each supplier name is included only once into the result set.

SELECT DISTINCT s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name')) AS s;

Result:
1
Changing [*] to [1] only retrieves the first supplier and since the first supplier is the same across the data set of JSON documents only one supplier name appears in the result set.

SELECT DISTINCT s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[1]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name')) AS s;

Result:
2

Deeper Reach: Listing all Parts and Prices

The next query lists all parts and their prices. It creates two columns, one for the part number and one for the price. However, since each supplier can supply more than one part, the query has to iterate over two arrays (using the [*] notation), once over suppliers, and once over all parts within each supplier. The result is ordered in order to provide a consistent layout. 

SELECT DISTINCT s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*].parts[*]'
                COLUMNS (part_no NUMBER
                         PATH '$.partNo',
                         part_price NUMBER
                         PATH '$.price')) AS s
ORDER BY s.part_no;

Result:
3
It would be more interesting to see the supplier name also for each of the parts. This is where un-nesting is applied.

Un-nesting: Listing all Parts and Prices for each Supplier and for each Reseller

The next query lists the parts and their prices for each supplier, and for each reseller. This allows to see the reseller, the suppliers they use, as well as the parts that are supplied.

SELECT r.reseller_name,
       s.*
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price'))) AS s
ORDER BY r.reseller_name,
         s.supplier_name,
         s.part_no;

Result:
4
Note the keyword NESTED that transforms a hierarchical relationship into a relational representation by repeating the column value on the higher level. In this example, for each part_no and part_price, the supplier_name is provided.

Computation on Un-nested JSON Data

The following query shows the inventory and inventory value for each reseller and supplier. The value is dynamically computed by the query.

SELECT r.reseller_name,
       s.*,
       s.part_price * s.part_quantity AS inventory_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER
                                  PATH '$.quantity'))) AS s
ORDER BY r.reseller_name,
         s.supplier_name,
         s.part_no;

Result:
5

Aggregation on Un-nested JSON Data

The following query shows the total inventory of each reseller by aggregation in the query.

SELECT r.reseller_name,
       SUM(s.part_price * s.part_quantity) AS inventory_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER
                                  PATH '$.quantity'))) AS s
GROUP BY r.reseller_name
ORDER BY r.reseller_name;

Result:
6

Changing to Seller Perspective

The following query shows the quantity and sales of each type of part to resellers. The perspective changes from a reseller perspective so far to a seller perspective. In SQL this is straight-forward, however, in context of nested JSON structures this would mean processing of the 1:n relationship in the opposite direction.

SELECT s.supplier_name,
       r.reseller_name,
       s.part_no,
       s.part_price,
       s.part_quantity,
       s.part_price * s.part_quantity AS sales_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER 
                                  PATH '$.quantity'))) AS s
ORDER BY s.supplier_name,
         r.reseller_name,
         s.part_no;

Result:
7

Aggregation of Seller Data

And, finally, this query shows the total sales by each supplier to each reseller.

SELECT s.supplier_name,
       r.reseller_name,
       SUM(s.part_price * s.part_quantity) AS sales_value
FROM reseller r,
     json_table(r.suppliers,
                '$.suppliers[*]'
                COLUMNS (supplier_name VARCHAR2(32 CHAR)
                         PATH '$.supplier_name',
                         NESTED path '$.parts[*]'
                         COLUMNS (part_no VARCHAR2(32 CHAR)
                                  PATH '$.partNo',
                                  part_price NUMBER
                                  PATH '$.price',
                                  part_quantity NUMBER
                                  PATH '$.quantity'))) AS s
GROUP BY s.supplier_name,
         r.reseller_name
ORDER BY s.supplier_name;

Result:
8

Summary

As shown, the JSON function json_table() supports transforming hierarchically structured data into relationally structured data so that it is possible to execute expressive SQL queries, not only retrieving data, but also aggregating data on different levels of the hierarchy.

Once the hierarchical structure is un-nested, the data can be queried from different perspectives, aka, the 1:n relationship in the nested structure can be processed in both directions.

This is another example of the quite powerful and convenient integration of JSON into the world of relational databases and SQL. From an application development perspective this means that JSON documents can be used and at the same time made accessible to the full power of SQL.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 5): Property Existence

JSON structures do not have to follow a global schema in Oracle 12c; each JSON structure in a JSON column can have its own schema, with varying properties. This blog discusses the function JSON_EXISTS() that checks the existence of a property in a JSON structure and its application in SQL queries.

Sample Data Set

A sample data set for this blog is as follows. Note that not all JSON structures follow the same schema.

DROP TABLE people;
CREATE TABLE people
  (
    id NUMBER,
    person CLOB CONSTRAINT person_ensure_json 
    CHECK (person IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO people VALUES
(1, '{ "name": "Bob", "city": "SF"}');
INSERT INTO people VALUES
( 2, '{ "name": "Jake", "city": "PA"}');
INSERT INTO people VALUES
( 3, '{ "name": "Alice", "city": "NYC"}');
INSERT INTO people VALUES
( 4, '{ "name": "Jenn", "city": "Tokyo"}');
INSERT INTO people VALUES
( 5, '{ "name": "Frank", "city": "Berlin"}');
INSERT INTO people VALUES
( 6, '{ "name": "Jup", "city": "Paris"}');
INSERT INTO people VALUES
( 7, '{ "name": "Jib"}');
DROP TABLE cities;
CREATE TABLE cities
  (
    id NUMBER,
    city CLOB CONSTRAINT city_ensure_json 
    CHECK (city IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO cities VALUES
(101, '{"city": {"name": "SF", "population": 837442, 
                 "area": 121.4}, 
        "state": "CA", "country": "US"}');
INSERT INTO cities VALUES
(102, '{"city": {"name": "PA", "population": 64403, 
                 "area": 66.787}, 
        "state": "CA", "country": "US"}');
INSERT INTO cities VALUES
(103, '{"city": {"name": "NYC", "population": 8405837, 
                 "area": 783.8}, 
        "state": "NY", "country": "US"}');
INSERT INTO cities VALUES
(104, '{"city": {"name": "Tokyo", "population": 9071577, 
                 "area": 622.99}, 
        "state": null, "country": "Japan"}');
INSERT INTO cities VALUES
(105, '{"city": {"name": "Paris", "population": 10869000, 
                 "area": 2845}, 
        "state": null, "country": "France"}');
INSERT INTO cities VALUES
(106, '{"city": {"name": "Berlin", "population": 3517424, 
                 "area": 891.75}, 
        "state": null, "country": "Germany"}');
INSERT INTO cities VALUES
(107, '{"city": {"name": "Asilomar", "population": 100, 
                 "area": 2}}');

Absent JSON Property

The default SQL query behavior for absent JSON properties is discussed briefly first in order to establish the base semantics. 

  • Projection
    • Absent JSON properties are returned as SQL NULL
    • SELECT DISTINCT c.city.country FROM cities c;
    • Absent properties can therefore be checked for by using SQL NULL in a predicate
    • SELECT DISTINCT c.city.city.name, c.city.country FROM cities c WHERE c.city.country IS NULL;
  • Selection
    • Absent JSON properties do not match in predicates
    • SELECT DISTINCT c.city.city.name FROM cities c WHERE c.city.country = ‘US';
  • Cartesian Product (join)
    • Absent JSON properties do not contribute to the cartesian product
    • SELECT cl.city.city.name, cr.city.city.name FROM cities cl, cities cr WHERE cl.city.country = cr.city.country;

The base semantics of absent JSON properties is consistent with their representation as SQL NULL.

Reasons for Absent JSON Properties

Why would JSON properties be present in some JSON structures of a JSON column, and absent from others? There are many possible legitimate reasons, some are discussed next:

  • Life Cycle Implementations (Application Design)
    • The life cycle of a concept (e.g., person) is implemented by adding property values over time. Properties are stored corresponding to the life cycle. For example, initially a person is added by name, with city and other information added later as they become known.
  • Schema Evolution
    • A property that was not required in the past (from a data model perspective) is added due to data model changes. Adding a new property to the already existing JSON structures might or might not be done and if not, different JSON documents have a different structure.
  • Errors
    • An error in the application system omits writing a property that should have been written. A real bug.
  • Intentional Design Decision
    • An application system was designed to only write properties that have values and to omit properties that do not have a value (yet). A corresponding design for deletion is that if a value of a property is deleted, the whole property is removed instead of being set to e.g. JSON null.

There is a design philosophy that emphasizes that even through JSON supports varying structures, they ideally should follow the same schema. This would mean that all JSON structures in a JSON column have the exact same set of properties. However, another design philosophy says that the dynamic nature of JSON structures should be taken advantage of avoiding the classical schema migration problem: a JSON structure should only contain those properties that make sense at this state of the structure life cycle.

Checking for Absent JSON Properties: JSON_EXISTS()

Since each JSON structure in a JSON column can have different schema it might be important to explicitly check for properties in SQL queries for query reliability and also documentation reasons.

Oracle 12c implements the function JSON_EXISTS() [https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246] that allows to check for existence of a property. For example:

SELECT DISTINCT c.city.city.name AS name 
FROM cities c 
WHERE json_exists(c.city, '$.country');

1

This query only selects those city names of cities that have their country specified.

More queries using JSON_EXISTS() follow:

CREATE OR REPLACE VIEW all_cities 
AS 
SELECT DISTINCT c.city.city.name AS name 
FROM cities c 
WHERE json_exists(c.city, '$.country');
SELECT p.person 
FROM people p, all_cities all_c 
WHERE p.person.city = all_c.name 
  AND json_exists(p.person, '$.city');

2

Details on JSON_EXISTS()

JSON_EXISTS() supports error handling clauses. For example, when a query accesses a JSON column with incorrect JSON structures, a choice is to ignore rows with incorrect JSON structures, include them into the result, or throw an error when processing the query. An example follows:

DROP TABLE incorrect_json;
CREATE TABLE incorrect_json
  ( id NUMBER, data CLOB);
INSERT INTO incorrect_json VALUES
(1, '{ "name": "item1", "value": "value1"}');
INSERT INTO incorrect_json VALUES
(2, '{ "name": "item2", "value": "value2_broken_json}');
SELECT * 
FROM incorrect_json ij 
WHERE json_exists(ij.data, '$.value' false ON error);

3

SELECT * 
FROM incorrect_json ij 
WHERE json_exists(ij.data, '$.value' true ON error);

4

SELECT * 
FROM incorrect_json ij 
WHERE json_exists(ij.data, '$.value' error ON error);

5

JSON_EXISTS() expects a column that can store json structures as strings; when applied to a non-string column, an error indicates this:

SELECT * 
FROM incorrect_json ij 
WHERE json_exists(ij.id, '$.missing_property');

6

When applied to a non-JSON column, the error message looks different:

DROP TABLE incorrect_json2;
CREATE TABLE incorrect_json2
  ( id VARCHAR(10), data CLOB);
INSERT INTO incorrect_json2 VALUES
('one', '{ "name": "item1", "value": "value1"}');
INSERT INTO incorrect_json2 VALUES
('two', '{ "name": "item2", "value": "value2_broken_json}');
SELECT * 
FROM incorrect_json2 ij 
WHERE json_exists(ij.id, '$' error ON error);

7

SELECT * 
FROM incorrect_json2 ij 
WHERE json_exists(ij.id, '$' false ON error);

8

SELECT * 
FROM incorrect_json2 ij 
WHERE json_exists(ij.id, '$' true ON error);

9

Summary

Varying schema in JSON structures can be explicitly explored using the JSON_EXISTS() function and therefore is a way to incorporate schema variation into SQL queries. This makes a powerful combination.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 4): Basic Views

With JSON support available in Oracle 12c, it is possible to define views on JSON data.

Background

Complex JSON documents do not have to be queried completely every time, and views can provide an easy way to define an important subset of the JSON documents’ content. The relational database management system Oracle 12c implements the view mechanism for SQL and that is available for JSON columns as well.

In the following a few JSON document view examples are discussed on and implemented. Views are defined as well as accessed, directly queried as well as part of other SQL statements.

Example Data Set

First, an example data set is introduced around people living in cities and data about these cities.

DROP TABLE people;
CREATE TABLE people
  (
    id NUMBER,
    person CLOB CONSTRAINT person_ensure_json 
    CHECK (person IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO people VALUES
( 1, '{ "name": "Bob", "city": "SF"}');
INSERT INTO people VALUES
( 2, '{ "name": "Jake", "city": "PA"}');
INSERT INTO people VALUES
( 3, '{ "name": "Alice", "city": "NYC"}');
INSERT INTO people VALUES
( 4, '{ "name": "Jenn", "city": "Tokyo"}');
INSERT INTO people VALUES
( 5, '{ "name": "Frank", "city": "Berlin"}');
INSERT INTO people VALUES
( 6, '{ "name": "Jup", "city": "Paris"}');
DROP TABLE cities;
CREATE TABLE cities
  (
    id NUMBER,
    city CLOB CONSTRAINT city_ensure_json 
    CHECK (city IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO cities VALUES
(101, '{"city": {"name": "SF", "population": 837442, 
                 "area": 121.4}, 
        "state": "CA", "country": "US"}');
INSERT INTO cities VALUES
(102, '{"city": {"name": "PA", "population": 64403, 
                 "area": 66.787}, 
        "state": "CA", "country": "US"}');
INSERT INTO cities VALUES
(103, '{"city": {"name": "NYC", "population": 8405837, 
                 "area": 783.8}, 
        "state": "NY", "country": "US"}');
INSERT INTO cities VALUES
(104, '{"city": {"name": "Tokyo", "population": 9071577, 
                 "area": 622.99}, 
        "state": null, "country": "Japan"}');
INSERT INTO cities VALUES
(105, '{"city": {"name": "Paris", "population": 10869000, 
                 "area": 2845}, 
        "state": null, "country": "France"}');
INSERT INTO cities VALUES
(106, '{"city": {"name": "Berlin", "population": 3517424, 
                 "area": 891.75}, 
        "state": null, "country": "Germany"}');

Views

A few example views and possible uses are discussed in the following. The results of the queries are displayed in graphical representation provided by SQLDeveloper.

Simple View

CREATE OR REPLACE VIEW all_cities
AS
SELECT DISTINCT c.city.city.name 
FROM cities c;
SELECT * FROM all_cities;

1

SELECT p.person
FROM people p, all_cities all_c
WHERE p.person.city = all_c.city;

 2

SELECT p.person.name, all_c.city
FROM people p, all_cities all_c
WHERE p.person.city = all_c.city;

3

View Definition with Predicate

CREATE OR REPLACE VIEW us_cities
AS
SELECT DISTINCT c.city.city.name 
FROM cities c 
WHERE c.city.country = 'US';
SELECT * FROM us_cities;

 4

SELECT p.person.name, us_c.city
FROM people p, us_cities us_c
WHERE p.person.city = us_c.city;

5

View Definition with Computation

CREATE OR REPLACE VIEW city_stats
AS
SELECT 
  c.city.city.name 
    AS name, 
  ROUND(c.city.city.population / c.city.city.area, 2) 
    AS population_density
FROM cities c;
SELECT * FROM city_stats;

 6

SELECT p.person.name, c_s.name, c_s.population_density
FROM people p, city_stats c_s
WHERE p.person.city = c_s.name
ORDER BY c_s.population_density DESC;

7

Summary

The benefit of having JSON support in a relational database management system is that mechanisms and operations beyond querying are available with views being one. It is clear that the ability to define views is important, from a data model and data semantics perspective as well as from a data security and developer convenience viewpoint.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 3): Basic Joins

Having JSON support in a relational database means that the join operator is available not only for joining relational data or JSON data, but also for the mixed case: joining JSON and relational data. This opens up a whole new world of data modeling and query execution.

Running Example

This running example creates three tables, “demo”, “city” and “city_rel” and a sample data set in each table. The tables “city” and “city_rel” contain the same data set, once in JSON format, and once in relational format.

DROP TABLE demo;
CREATE TABLE demo
(
  id NUMBER,
  person CLOB 
    CONSTRAINT person_ensure_json 
    CHECK (person IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO demo VALUES
( 1, '{ "name": "Bob", "city": "SF"}' );
INSERT INTO demo VALUES
( 2, '{ "name": "Jake", "city": "PA"}' );
INSERT INTO demo VALUES
( 3, '{ "name": "Alice", "city": "NYC"}' );
INSERT INTO demo VALUES
( 4, '{ "name": "Jenn",  "city": {"name": "Tokyo"}}' );
INSERT INTO demo VALUES
( 5, '{ "name": "Jenn",  "city": ["Tokyo"]}' );
INSERT INTO demo VALUES
( 6, '{ "name": "Jenn",  "city": 66}' );
DROP TABLE city;
CREATE TABLE city
(
  id NUMBER,
  city CLOB 
    CONSTRAINT city_ensure_json 
    CHECK (city IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO city VALUES
( 101, '{"city": "SF", "state": "CA", 
  "country": "US"}' );
INSERT INTO city VALUES
( 102, '{"city": "PA", "state": "CA", 
  "country": "US"}' );
INSERT INTO city VALUES
( 103, '{"city": "NYC", "state": "NY", 
  "country": "US"}' );
INSERT INTO city VALUES
( 104, '{"city": {"name": "Tokyo"}, "state": null, 
  "country": "Japan"}' );
INSERT INTO city VALUES
( 105, '{"city": ["Tokyo"], "state": null, 
  "country": "Japan"}' );
INSERT INTO city VALUE
( 106, '{"city": 66, "state": null, 
  "country": "World"}' );
DROP TABLE city_rel;
CREATE TABLE city_rel
(
  id      NUMBER,
  city    VARCHAR(255),
  state   VARCHAR(255),
  country VARCHAR(255));
INSERT INTO city_rel VALUES
( 1001, 'SF', 'CA', 'US' );
INSERT INTO city_rel VALUES
( 1002, 'PA', 'CA', 'US' );
INSERT INTO city_rel VALUES
( 1003, 'NYC', 'NY', 'US' );
INSERT INTO city_rel VALUES
( 1004, '{"name": "Tokyo"}', NULL, 'World' );
INSERT INTO city_rel VALUES
( 1005, '["Tokyo"]', NULL, 'World' );
INSERT INTO city_rel VALUES
( 1006, '66', NULL, 'World' );

JSON Join

The following SQL statement is a simple join between JSON structures on the property “city”:

SELECT *
FROM demo d, city c
WHERE d.person.city = c.city.city;

This SQL statement projects in addition to joining JSON structures:

SELECT d.person, c.city
FROM demo d, city c
WHERE d.person.city = c.city.city;

The following SQL statement extends the projection:

SELECT 
  d.id,
  d.person.name,
  d.person.city,
  c.id,
  c.city.state,
  c.city.country
FROM demo d, city c
WHERE d.person.city = c.city.city;

JSON – Relational Join

 This SQL statement shows the join between JSON and relational data, combined with a projection:

SELECT 
  d.id,
  d.person.name,
  d.person.city,
  c_r.id,
  c_r.city,
  c_r.state,
  c_r.country
FROM demo d, city_rel c_r
WHERE d.person.city = c_r.city;

Significance of Pure and Mixed JSON Joins

As shown, the join operator is applied easily within JSON tables and across JSON and relational tables. When using Oracle 12c there is no restriction anymore when it comes to the join operator in conjunction of JSON documents.

Pure JSON joins are possible in context of Oracle 12c. This means that developers have a choice to model all data in a pure document form (trying to avoid the need for joins by creating sub-collections – which is almost impossible without denormalization), or to consciously model documents in such a way that the document nature is applied where applicable without having to necessarily de-normalize as the join operator is available.

The mixed case between JSON and relational tables goes a lot further as now data can be modeled according to its nature (not all data is exclusively document-oriented or relational) and its access path requirements without compromising either way.

In addition, the mixed case supports the situation where data is already present in the database in relational form and new data is added in JSON form. This means that even if data is available in relational form, additional data does not have to be in relational form, and the most appropriate representation can be chosen (and no separate document-oriented database has to be deployed, btw).

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

NoSQL: No-Tation

This is too good to pass as there is a ton of truth in this simple presentation below. It is cited from http://java.dzone.com/articles/history-databases-”no-tation” without modification:

NoSQL_No_Tation

What Flavor of SQL?

The question now is: what flavor of SQL is the SQL in “No, SQL!”?

One alternative is to start from scratch and invent a SQL grammar and semantics that is not based on the existing SQL standard and SQL implementations. That would not be my choice at all, for the obvious reasons.

Another alternative is to treat JSON ‘just’ as a data structure and embed it into existing SQL. This would be roughly the equivalent of the object/relational model (~NF2) with the variation that there is no global schema and some non-standard SQL types.

The (in my opinion) better alternative is to look at JSON as more than just a data structure and combine JSON with the schema-per-document paradigm as the underlying philosophy.

  • This would be a real game changer as it would cater to the dynamic changing JSON data structures as needed in many projects.
  • As a result, the schema-per-document model would be integrated with the relational model so that a single database management system can support several data models and their paradigms concurrently.

Extending SQL Semantics

Supporting a schema-per-document philosophy requires extending the SQL semantics and its operators to be able to deal with the fact that different JSON structures might have a different schema, even if they are in the same collection or relational table.

Operators like typeOf() come to mind that can test if a path into a JSON structure refers to a specific JSON type or JSON literal name. Another operator is exists() to test if a path into a JSON structure exists. Or array operators that can determine if an array contains a specific JSON structure as an element. containsPath() is an operator that checks if a JSON structure has the path given.

Many more operators can be envisioned to make managing and querying JSON structures in context of SQL expressive and useful.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.