Oracle 12c – In-Memory Option (Part 3): JSON Support

It is quite natural to view Oracle’s In-Memory Option in context of the relational model; however, the In-Memory Option supports the JSON model at the same time as well.

Oracle JSON Support

As shown earlier in this blog, the Oracle 12c database supports JSON natively [http://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246] and incorporates JSON access into SQL so that JSON structures can be accessed through SQL directly. In addition, in a single SQL statement, JSON structures as well as relational tables can be accessed at the same time. Please see the documentation and the other blog entries for more details.

Oracle In-Memory Option

The In-Memory Option [http://www.oracle.com/us/corporate/features/database-in-memory-option/] was introduced in the previous two blog entries and those discussed a relational example as well as queries that allow to introspect the meta data and status of the In-Memory Option processing. It gave an overview as well as the rationale why using the In-Memory Option for analytic queries is advantageous over regular relational processing (columnar representation of data in main memory).

Combination: JSON processing using In-Memory Option

Analytics in context of JSON structures would benefit from In-Memory Option support as well. Many applications based on JSON structures usually have to support some form of analytics. The In-Memory Option supports aggregation functionality on JSON structures as well as on relational tables.

In the following, an example is introduced that creates a table containing a JSON structure, enables it for In-Memory Option support and shows a few simple analytics queries.

Example Table with JSON Structure

The following declares a sample table with a column containing a JSON structure.

DROP TABLE js_players;
CREATE TABLE js_players
 (
   player_id NUMBER NOT NULL,
   player_name VARCHAR(255),
   games VARCHAR(4000) 
    CONSTRAINT games_ensure_json
    CHECK (games IS JSON (STRICT WITH UNIQUE KEYS)));

Table Population

In order to have a large data set the following block creates rows containing JSON structures and inserting them into the above declared table.

DECLARE
  counter NUMBER;
  player_name VARCHAR(128);
  games_value VARCHAR (256);
BEGIN
  counter := 0;
  LOOP
    counter := counter + 1;
    player_name := 'Divvon' || '_' || counter;
    games_value := '{ "games":
      [{"name":"tic-tac-toe", 
        "points":' || counter *10 || '},
       {"name":"one-two-three", 
        "points":' || counter * 100 || '}
      ]}';
    INSERT INTO js_players VALUES
      (counter, player_name, games_value);
  EXIT
    WHEN counter = 1000000;
  END LOOP;
  COMMIT;
  RETURN;
END;

An example JSON document looks like this:

{"games":
  [{"name":"tic-tac-toe", "points":750},
   {"name":"one-two-three", "points":7500}
  ]}

Enabling In-Memory Option

This statement enables the In-Memory Option for the table containing JSON structures:

ALTER TABLE js_players inmemory priority critical;

Aggregation Queries

The following queries show aggregation over elements of the JSON structures. Each query extracts all ‘points’ values from the JSON structure (since every document might have several ‘points’ fields in the array of ‘games’) and makes is accessible as ‘pts.points’. Then it aggregates over this structure. The first query aggregates over all rows, whereas the second query is selecting only a few rows based on ‘player_name’.

SELECT DISTINCT p.player_name,
       SUM(pts.points) sum_points,
       MIN(pts.points) min_points,
       MAX(pts.points) max_points,
       AVG(pts.points) avg_points,
       COUNT(*)
FROM js_players p,
     json_table(p.games, 
                '$.games[*]' COLUMNS (points VARCHAR2(32 CHAR) PATH '$.points')) 
         AS pts
GROUP BY p.player_name
ORDER BY sum_points DESC;

01_blog

SELECT DISTINCT p.player_name,
       SUM(pts.points) sum_points,
       MIN(pts.points) min_points,
       MAX(pts.points) max_points,
       AVG(pts.points) avg_points,
       COUNT(*)
FROM js_players p,
     json_table(p.games, 
                '$.games[*]' COLUMNS (points VARCHAR2(32 CHAR) PATH '$.points')) 
       AS pts
WHERE p.player_name = 'Divvon_204'
      OR p.player_name = 'Divvon_206'
GROUP BY p.player_name
ORDER BY sum_points DESC;

02_blog

Summary

Oracle’s In-Memory Option, because it not only supports the relational model, but also the JSON model, is an interesting alternative to analytics in context of JSON data sets. This blog has shown an example of how to combine JSON structures and the In-Memory Option in order to be able to run analytics queries.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – In-Memory Option (Part 2)

Part 1 was introducing an Oracle In-Memory Option example; in Part 2 I’ll introduce a few queries that show meta data and statistics in context of the Oracle In-Memory Option.

Table Status

One interesting set of metadata is the table’s in-memory status as well as the compression and loading priority. A table with enabled in-memory option is called “in-memory table” in the following for short. The following query retrieves these metadata for the in-memory table “IM_PLAYERS”:

SELECT table_name,
       inmemory,
       inmemory_compression,
       inmemory_priority
FROM dba_tables
WHERE table_name = 'IM_PLAYERS';

01

Loading of in-memory tables takes some time on initial load and their load status is important. Only when loaded the full capability of the in-memory functionality is available. The following query shows the load status by retrieving the number of bytes not populated into the in-memory area yet:

SELECT v.segment_name name,
       v.populate_status populate_status,
       v.bytes_not_populated bytes_not_populated
FROM v$im_segments v
ORDER BY 1;

02

The size of an in-memory table is important in order to determine if the memory allocation for the in-memory option is sufficient. The following query fetches various sizes:

SELECT v.owner,
       v.segment_name,
       v.populate_status,
       ROUND(v.bytes / 1000 / 1000, 2) 
           AS size_in_mb,
       ROUND(v.inmemory_size / 1000 / 1000, 2) 
           AS inmemory_size_in_mb,
       ROUND(v.inmemory_size / v.bytes, 2) * 100 
           AS im_size_percent
FROM v$im_segments v;

03

Basic Usage

When running queries against in-memory tables the in-memory option might or might not be triggered and used depending on the query optimizer’s decisions. The following two statements executed together show if the in-memory option was used:

EXEC dbms_feature_usage_internal.
         exec_db_usage_sampling(SYSDATE);
SELECT ul.name,
       ul.detected_usages
FROM dba_feature_usage_statistics ul
WHERE ul.version =
          (SELECT MAX(u2.version)
           FROM dba_feature_usage_statistics u2
           WHERE u2.name = ul.name
                 AND ul.name LIKE 'In-%');

04

A query to an in-memory table advances the value of detected_usages if the in-memory representation of that table was in fact accessed.

Statistics

If a query triggers the use of the columnar representation managed by the in-memory option it is important to understand the specific statistics to see how much optimization took place. The following query retrieves some of all available statistics:

SELECT t.display_name,
       s.value,
       s.sid,
       t.statistic#
FROM v$statname t, v$mystat s
WHERE t.display_name IN ('IM scan CUs columns accessed', 
                         'IM scan segments minmax eligible', 
                         'IM scan CUs pruned')
      AND s.statistic# = t.statistic#
ORDER BY s.sid DESC;

05

Being able to retrieve the statistics allows now to see the statistics before and after query execution. In the next section a selective as well as aggregate query is run and the statistics examined before and afterwards.

Example

The first example is a selective query that retrieves all columns of two players (rows).

SELECT * 
FROM cb.im_players 
WHERE player_id = 10 OR player_id = 20;

06

First, the statistics is shown before the query is executed and then the statistics after the query is executed.

05

07

The second example is a non-selective query that aggregates over all rows.

SELECT player_name,
       COUNT(*),
       MIN(player_score),
       MAX(player_score),
       AVG(player_score),
       SUM(player_score)
FROM cb.im_players
GROUP BY player_name;

08

Again, first the statistics before and after the query execution is shown.

07

09

Summary

This blog introduced a series of queries that allow to retrieve metadata as well as statistics on query execution in context of the Oracle In-Memory Option. The nice part is that these are regular SQL queries that can be run manually, but also as part of development or operations tools. It is therefore possible to examine the use of the in-memory tables and their behavior in a running application.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – In-Memory Option (Part 1)

The name ‘In-Memory Option’ does not mean Oracle turned into a general main memory database or has a new option to do so; it means something else entirely.

Aggregation

Columnar data representation is geared towards aggregation of columns in relational databases. In order to make aggregation fast, the rows of a column should be ‘near’ to each other in terms of memory layout; the best would be if the rows of the column would be next to each other. Access for aggregation is very efficient with this layout. A database implementing this approach is called a columnar database (http://en.wikipedia.org/wiki/Column-oriented_DBMS).

Specialized databases exist that structure the data on disk as well as in memory in a columnar format. Once the data is in such a database, aggregation can be fast. However, the data has to be in the database and chances are that the same data is in a separate OLTP database also (a second copy). This then means that a data integration problem has to be solved in order to ensure that data that is managed in the OLTP database is also available (with sufficiently minor delay) in the columnar database (or columnar store). And, since it changes in the OLTP database, the integration effort is an ongoing process.

In-Memory Option

Oracle 12c’s In-Memory Option [http://www.oracle.com/us/corporate/features/database-in-memory-option/] combines the columnar storage with OLTP storage in the same database. The columnar storage layout is only implemented in main memory, leaving the disk layout as it is. This means that OLTP and columnar access is possible in the same database, concurrently, and the data is always 100% transactionally consistent, as no delayed data integration has to take place. The ‘artificial’ separation of database functionality into separate databases to satisfy aggregation processing requirements is not necessary anymore in such a database management system.

Example

Since Oracle 12c combines the relational and the columnar representation, using the columnar functionality is fully declarative without any operational management overhead.

In the following a small example shows how to make use of the Oracle 12c In-Memory Option concurrently to the regular OLTP load.

Assume a regular table that has been in place for a while and is accessed by a regular application (OLTP use case). This table manages data and is changed according to the application logic.

DROP TABLE im_players;
CREATE TABLE im_players
  (
   player_id NUMBER,
   player_name VARCHAR2(128),
   player_score NUMBER);

Here is a way to populate the table with random content for testing purposes:

DECLARE
  player_id NUMBER;
  player_score NUMBER;
  player_name VARCHAR(128);
BEGIN
  player_id := 0;
  LOOP
    player_id := player_id + 1;
    player_name := 'player_' || player_id;
    player_score := player_id * 10;
    INSERT INTO im_players VALUES
      (player_id, player_name, player_score);
    player_score := player_id * 100;
    INSERT INTO im_players VALUES
      (player_id, player_name, player_score);
    player_score := player_id * 1000;
    INSERT INTO im_players VALUES
      (player_id, player_name, player_score);
  EXIT
    WHEN player_id = 1000000;
  END LOOP;
  COMMIT;
  RETURN;
END;

A new requirement has to be fulfilled based on business needs. The requirement is to analyze the existing table regularly while it is being used by the OLTP application. In the following this requirement is implemented with the Oracle 12c In-Memory Option.

First, memory has to be set aside for the columnar representation of the data in main memory:

ALTER system SET inmemory_size = 300M scope=spfile;

Restarting the database enables this change and this query confirms the setting:

SELECT name, value FROM v$sga;

01

Now the database has a main memory area (In-Memory Area) set aside for the columnar representation of relational tables.

Next, the existing table definition is modified to additionally be represented in columnar representation:

ALTER TABLE im_players inmemory priority critical;

There are different options that indicate how critical it is to have the rows in columnar format, how to compress depending on the access model, and actually which columns should be in columnar format as not all columns are necessarily required.

The alter statement above indicates that it is critical for the data to be in columnar format. The database will diligently load the data into the columnar memory section. Compression is enabled for low query access. All columns are enabled, based on the use case.

The following query shows how much data is not yet populated. Loading data is a process that takes time and the query supports checking the load status. Once the loading is completed, zero bytes are not yet populated.

SELECT 
  v.segment_name name,
  v.populate_status populate_status,
  v.bytes_not_populated bytes_not_populated
FROM v$im_segments v
ORDER BY 1;

02

The following query performs an aggregation.

SELECT
  player_name,
  COUNT(*),
  MIN(player_score),
  MAX(player_score),
  AVG(player_score),
  SUM(player_score)
FROM im_players
GROUP BY player_name;

??03

03

The query plan of this query shows that the columnar representation is being accessed.

----------------------------------------------------------------------------------------
| Id | Operation                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |            | 3025K | 227M  | 216 (38)   | 00:00:01 |
|  1 | HASH GROUP BY              |            | 3025K | 227M  | 216 (38)   | 00:00:01 |
|  2 | TABLE ACCESS INMEMORY FULL | IM_PLAYERS | 3025K | 227M  | 141 (5)    | 00:00:01 |
----------------------------------------------------------------------------------------

And this query shows access metrics indicating that the query actually executed against the columnar representation. The values after executing the aggregation query are different from the values before the aggregation.

Note on SQL

An important aspect of the Oracle 12c In-Memory Option is that the SQL specifying the required aggregation is regular SQL and no new or different language constructs had to be introduced. The underlying reason is that the query optimizer has sufficient knowledge to make the appropriate access choices based on the SQL presented and does not require hints from users. This is a huge engineering benefit since no new or different language has to be learned to make use of the columnar representation.

Summary

This was a super-brief introduction into the Oracle 12c In-Memory Option. The documentation has of course a lot more background [http://docs.oracle.com/database/121/ADMIN/memory.htm#ADMIN14257]. In addition, a nice blog series exists that introduces the various aspects step-by-step [https://blogs.oracle.com/In-Memory/entry/getting_started_with_oracle_database].

From a data management aspect an optimized representation targeted towards aggregation is enabled declaratively in context of the same database that holds the OLTP data. This is a huge benefit as one database supports the mixed workload and any data integration setup between different databases becomes unnecessary. With significantly increasing main memory sizes this option allows freeing up engineering cycles as well as operational resources while increasing data consistency and processing performance.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.