Oracle 12c Release 2: A Complete JSON Database

Today’s blog focuses on Oracle 12c Release 2. With this release of the database additional new interesting functionality was introduced: sharding and analytics support for JSON. Very interesting in context of JSON processing – future blogs will continue the JSON SQL discussion.

Oracle as a JSON Database

With Oracle 12c Release 2’s functional additions Oracle 12c is providing all major functional areas of functionality in context of JSON processing. Non-functional areas (backup, restore, replication, HA/DR support,  etc.) are supported as well, and there is no need to discuss those here.

The major areas of functionality in this context are

  • JSON OLTP Processing. Oracle 12c supports general OLTP (online transaction processing) functionality. JSON documents are stored in tables and accessed through SQL (full DML support).
  • JSON Analytics. Oracle 12c supports JSON analytics processing in the Oracle In-Memory option providing columnar representation and columnar processing.
  • JSON Sharding. Oracle 12c supports managed sharding of data in tables of independently running databases (shards); up to 1000 shards currently.

Details of JSON OLTP, Analytics and Sharding

The following presentation contains a first level of details for the three areas above. I gave it during an Oracle Code event (https://go.oracle.com/oraclecode) in New York (https://developer.oracle.com/code/newyork) earlier this month. References to more detailed Oracle 12c documentation are provided in the presentation itself.

The presentation can be downloaded from here: Oracle Code NYC Presentation.

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Software Design in Context of Database Failover (Part 5): Architecture Using Non-Transactional Persistent Resource Managers

Non-transactional persistent resource managers are tricky in context of database fail-over. A short discussion shines the light on some of the problems.

Non-Transactional Persistent Resource Managers

File systems and queuing systems are in general non-transactional persistent resource managers that are typically used outside transaction control, in many cases because the technology chosen does not support transactions in the first place (some implement transaction functionality, many do not; here I focus on those that do not). Files can be created, deleted or their contents manipulated; queue elements can be enqueued or dequeued from queues (in FIFO order, priority order, or any other supported queuing policy).

Because operations like inserting, updating or removing data items from non-transactional persistent resource managers are executed outside transaction control, they (a) cannot be coordinated in the same transactions of transactional resource managers, and (b) in case of a failure, the state of the data items is unclear since the non-transactional resource manager does not provide any transactional guarantees in failure situations (see the appendix below for the taxonomy on data consistency).

For example, when a data item is enqueued in a queue and in that moment the queuing system fails (by itself or because the underlying hardware failed) then there is no defined outcome for that enqueue operation. It could be that the data items is properly enqueued and all is fine. It is possible that the data item was not enqueued at all. Worse, it might be possible that the data item was enqueued, but not all supporting data structures were fully updated inside the queuing system, leaving it itself in an inconsistent state still requiring recovery actions.

Best Effort Consistency

From a software architecture viewpoint, non-transactional persistent resource managers are best-effort systems and this has to be kept in mind when they are used and deployed. With the rather reliable hardware servers nowadays this limiting property can easily be forgotten or pushed aside as the mean time between failures is rather long.

However, when data consistency is critical, and non-transactional resource managers have to be used, effort can be put in place to mitigate at least some of the potential data consistency problems.

Data Consistency through Data Duplication

A fairly straight-forward approach to integrate non-transactional resource managers is to store the data items twice: once in the non-transactional resource manager, and once in a transactional resource manager. Any update or delete must take place in both systems (however, as discussed, cannot be done inside a single transaction across both).

For example, every item enqueued in a non-transactional queue could also be inserted into a relational table. If an item is dequeued, it is also removed from the transactional table. The logic could first enqueue the data item into the queue, and when successful, insert it into the table. Upon dequeue, the same applies. The items is first dequeued from the queue, and if successful, removed from the table. The invariant is that a queued data item is only considered consistent if it is enqueued in the queue and also present in the table at the same time. 

Studying only the enqueue case, there are several failure cases:

  • Enqueue fails.
  • Enqueue succeeds, but system fails before the insert into the table took place.
  • Enqueue succeeds, but insert in the table fails.

In the first case, the enqueue fails and the insert into the table will never take place. So the invariant is fulfilled. In the second case the item is in the queue, but not in the table. The invariant is violated. The system can now decide how to deal with the situation after the failure: it can remove the queue item, or it can insert it into the table. The same situation occurs in the third case.

Underlying this approach is the requirement that it is possible to check for the presence of the data item in the non-transactional as well as transactional resource manager in order to determine if the data item is present after a failure. For example, in the second case above, after the systems comes back up, the following check has to occur: (a) for every item in the queue an equivalent items must be in the table and (b) for each item in the table there must be an item in the queue. If there is a data item in one, but not the other resource manager, the invariant is violated and the inconsistency has to be removed by either removing or adding the data item so that none or both of the resource managers have the data item (and therefore the invariant holds).

Database Fail-over Data Loss Recovery

When a database fails over, as discussed in earlier blogs, it is possible that no data loss occurs. In this case the transactional resource manager remains consistent with the non-transactional resource manager.

However, a data loss is possible. This means that one or more data items in the transactional resource manager that were present before the fail-over are not present anymore after the fail-over. In this case either the items need to be added to the transactional resource manager again (based on the content in the non-transactional resource manager), or they have to be removed from the non-transactional resource manager also.

The particular strategy is to be determined in a specific software architecture, however, with data duplication this choice can be made as needed and required by the application system at hand.

Effort vs. Consistency

The outlined approach above (which can be applied to any non-transactional resource manager that can check for the existence of data items) fundamentally makes the non-transactional resource manager pseudo transactional by pairing it up with a transactional resource manager. This is implementation effort made in order to provide data consistency.

Less effort might mean less consistency. For example, it would be sufficient to store the data item in a table only for the duration until the data item is safely enqueued in the queue. Once that is the case the data item could be removed from the table. While this would ensure the consistency during the enqueue operation, it does not ensure consistency during the dequeue operation since not every type of failure during a dequeue operation would be recoverable.

In a data loss situation because of a database fail-over there would be no way to reconcile the two resource managers if not all data is available in both at the same time. In a particular situation that might be acceptable, but in general it would not.

Effort vs. Performance/Throughput

Additional effort takes additional resource, both in storage space as well as processing time. The price for making the system more consistent is a possibly a slower system, with less throughput, and higher storage space requirements. Again, this is trading off non-functional with functional properties.

Summary

The outlined issues are present in all systems that deploy non-transactional persistent resource managers. The outlined solution is a generic one and in specific situations alternative, more efficient approaches might be possible in order to guarantee data consistency.

As discussed, the choice of non-transactional persistent resource managers paired with the need for data consistency can be expensive from an engineering and system resource perspective. While the choice is not always a real choice, it is definitely worth evaluating alternatives that provide the required functionality in a transactional resource manager for comparison.

This blog is the last in the series around transactional and non-transactional resource manager use in context of database fail-over. Clearly, the software architecture has to be aware of the fact that a database fail-over is possible and that a data loss might occur along the way due to network errors, increased network latency and system failures. Some of the architectural problems have been discussed, and some solutions presented as starting points for your specific application system development efforts.

Go SQL!

Appendix: Taxonomy

The software architecture taxonomy relevant for database fail-over can be built based on the combinations of resource manager types used. In the following the various combinations are discussion on a high level (“x” means that the software architecture uses one or more of the indicated resource manager types).

Software Architecture Transactional Persistent Non-transactional Persistent Non-transactional and Non-persistent and rebuildable Non-transactional and Non-persistent and Non-rebuildable
Consistent x
Consistent x x
Possibly consistent x x
Possibly consistent x x x
Possibly consistent x x x x

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Software Design in Context of Database Failover (Part 4): Architecture on Transactional Persistent Resource Managers in Conjunction with Caches

If transactional resource managers are available, then all is good (see last blog). But what if this is not the case? This blogs starts discussing non-transactional resource managers.

Non-Transactional Resource Managers

There are different types of non-transactional resource managers (see the taxonomy in the appendix of this blog):

  • Non-transactional and persistent. Data is stored on persistent storage, but outside transaction protection.
  • Non-transactional, non-persistent and non-rebuildable. Data is managed in volatile memory, outside transaction protection, and if lost it cannot be rebuilt (it will be truly lost).
  • Non-transactional, non-persistent and rebuildable. Data is managed in volatile memory, outside transaction protection, but can be rebuilt if lost or intentionally deleted.

A non-persistent resource manager is usually called a cache. An example of a cache is a list of values in a Java singleton caching a subset of the data of a database table column. Another example is the user interface that caches a list of entries like sales opportunities and displays them to a user.

Proper implemented caches can be invalidated (aka, emptied out) and rebuilt based on data stored in one or several database tables as they are non-functional components put in place for performance reasons. Caches that can be invalidated are called “rebuildable caches”.

Caches that cannot be invalidated without loosing data are called “non-rebuildable caches”. Those should not be part of the software architecture and have to be avoided as lost data cannot be recovered. The impact of this is clear: you don’t want your checking account information be managed in such a cache.

In the following caches are discussed; the third type of resource manager, non-transactional and persistent, is going to be discussed in the next blog.

Rebuildable Cache and Database Fail-over

A rebuildable cache contains copies of data from the database. If a cache is accessed and it contains the requested data (“cache hit”), then the data is returned to the caller. If the cache does not contain the requested data (“cache miss”) the cache fetches the data first from the database and stores it before passing it on to the caller. In summary, the cache either has the data or knows how to get it from the database.

In context of a database fail-over it is possible that the primary and secondary database have the exact data set at the time of the fail-over [see blog https://realprogrammer.wordpress.com/2015/08/31/software-design-in-context-of-database-failover-part-2-software-architecture-taxonomy/]. In this case the cache contents remains consistent with the database state after a fail-over as all data in the cache has a corresponding data item in the database.

However, it is possible that a data loss occurs if the secondary database is lagging the primary database at the time of the fail-over. In this case some data might not be available anymore after the fail-over as this data was not replicated before the fail-over occurred.

A cache that has data that was available before the fail-over, but lost during the fail-over is therefore inconsistent with the database after the fail-over. A cache hit might occur, however, the cached data is not in the database. This would be incorrect behavior since the premise of a cache is that it caches data consistent with the database state.

The solution for the fail-over case is that all cache entries are invalidated (from all caches) right after the database fail-over took place before clients continue processing. Since each cache starts empty, many accesses will be a cache miss initially until the caches have built up the working set again. However, the caches will be consistent with the database state and the application system accesses consistent data.

From a software architecture approach is it very important to be able to know all implemented caches and ensure that each is a rebuildable cache.

Non-rebuildable Cache and Database Failover

Even if not desirable, non-rebuildable caches might exist and the cache invalidation functionality discussed above is not available. If the cache cannot be invalidated it is impossible to remove its contents after a database fail-over. Therefore it might be that the cache returns data to the client that is not present in the database anymore or not consistent with the data in the database. The client would have to be aware of this possibility and able to deal with such an inconsistency.

A special case needs to be discussed. It is possible that a non-rebuildable cache stores data from the database, but it is only missing the externally available functionality of cache invalidation. So in principle it could be rebuilt, but the rebuilt cannot be externally triggered due to a missing programming interface.

A possible approach in this case is to try to implement a workaround that is based on the assumption that a cache has limited capacity and that there is a lot more data in the database than the cache can hold. A brute force attempt to make the cache consistent again would be to implement a client after fail-over that requests every data item that is in the database from the cache. At some point the cache will be full without any capacity left and as a consequence the cache has to evict entries. As this happens, inconsistent entries will be removed to make room for those the client requested and caused a cache miss. Once all data was requested, the cache is consistent again.

Despite workarounds in special cases, there is no general approach that can address non-rebuildable caches. They have to be avoided if transactional consistency is important as consistency with these caches cannot be provided.

Summary

If caches are used in the software architecture, the best form is rebuildable caches that can recreate their state from persistent storage (aka, database) so that a consistent data state can be recreated after a loss-ful database fail-over. Those must have external interfaces so that an invalidation can be triggered right after the database fail-over completed.

Non-rebuildable caches might be managed in special cases with a workaround after a fail-over, however, as no general mechanism exists, they should be avoided altogether. Avoiding non-rebuildable caches is a safe architectural approach.

The next blog will discuss the third category of resource managers: non-transactional and persistent resource managers.

Go SQL!

Appendix: Taxonomy

The software architecture taxonomy relevant for database fail-over can be built based on the combinations of resource manager types used. In the following the various combinations are discussion on a high level (“x” means that the software architecture uses one or more of the indicated resource manager types).

Software Architecture Transactional Persistent Non-transactional Persistent Non-transactional and Non-persistent and rebuildable Non-transactional and Non-persistent and Non-rebuildable
Consistent x
Consistent x x
Possibly consistent x x
Possibly consistent x x x
Possibly consistent x x x x

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Software Design in Context of Database Failover (Part 3): Architecture on Transactional Persistent Resource Managers

The “Happy Path” architecture used individual consistent database transactions. What about individual (database) transactions that are only consistent as a set (set-consistent individual transactions)?

Overview

This blog is about the first row of the software architecture taxonomy (see a reference to the taxonomy in the appendix at the end of this blog as well as the previous blog [https://realprogrammer.wordpress.com/2015/08/31/software-design-in-context-of-database-failover-part-2-software-architecture-taxonomy/]). There are two relevant aspects in context of transactions:

  • Transactional persistent resource managers
  • Set-consistent individual transactions

Transactional Persistent Resource Managers

In a general software architecture not only relational tables are accessed in a relational database management system but also queuing systems or file systems, among other storage options. In order to arrive to a consistent system state all operations must take place within transactions so that the operations either succeed successfully and a new consistent state is reached or fail and the system returns to a previous consistent state. As a consequence, a transactional queuing system or transactional file system is necessary for this type of software architecture if queues or files are used to manage data.

There are many choices for transactional resource managers. In context of the Oracle database a queuing system (Advanced Queuing, AQ) [https://docs.oracle.com/database/121/ADQUE/aq_intro.htm#ADQUE0100] is available within the RDBMS that supports queuing functionality. The queue operations can be part of local database transactions. The software architecture therefore can make use of queues as well as relational tables and operations of both can be within the same or different local database transactions. This means that any operation mix of relational operations as well as queue operations are consistent when guarded by transactions.

For illustration, a small code snippet follows that shows a table insert and an AQ enqueue operation. The changes of both operations occur when successfully committed, or both rollback when aborted. Even though a table and a queue operation are performed, both are governed by a single local database transaction (no two-phase-commit protocol required).

DECLARE
    enqueue_options dbms_aq.enqueue_options_t;
    message_properties dbms_aq.message_properties_t;
    message_handle RAW(16);
    MESSAGE aqadm.message_type;
  BEGIN
    enqueue_options.VISIBILITY         := DBMS_AQ.ON_COMMIT;
    enqueue_options.SEQUENCE_DEVIATION := NULL;
    -- -----
    message_properties.PRIORITY    := -5;
    message_properties.DELAY       := DBMS_AQ.NO_DELAY;
    message_properties.EXPIRATION  := DBMS_AQ.NEVER;
    message_properties.CORRELATION := 'TEST MESSAGE';
    -- -----
    MESSAGE := aqadm.message_type('blog', 
      'failover architecture');
    -- -----
    DBMS_AQ.ENQUEUE(queue_name => 'aqadm.messageQueue', 
      enqueue_options => enqueue_options, 
      message_properties => message_properties, 
      payload => MESSAGE , msgid => message_handle);
    -- ----
    INSERT INTO standardtext VALUES('data guard');
COMMIT;
END;

As a note, distributed transactions (for example, transactions implementing the XA protocol) are left out of the discussion in this blog as accomplishing a high-performance system is rather difficult and requires that all transactional resource managers are capable to participate in the two phase commit protocol.

Set-Consistent Individual Transactions

Not all individual transactions lead to a consistent application system state. Transactions always result in a consistent database state, but that does not mean that the application system state is consistent from the viewpoint of the application.

An example is the purchasing process on a web site of a company. The customer selects products and adds them to a cart and at some point the checkout process is started by the user. Adding products to the cart are individual transactions and each leads to a consistent state. The checkout process itself could be a single transaction, however, adding address and credit card information takes time and that duration is too long to keep one database transaction open. Therefore, the checkout process is a set of individual transactions (like adding a shipping address, entering a credit card number, selecting shipping methods) and only after all transactions are committed the checkout process itself is complete.

The company’s goal is to complete checkout processes completely as only then it can ship goods and charge for them. Any system failure and fail-over must not prevent the checkout process to complete. Therefore the application system is considered to be in a consistent state only when the checkout process is completed in its entirety. In terms of transactions, the whole set of transactions constituting the checkout process must be completed.

What happens if the system fails during the checkout process (meaning, some transactions are completed and other have yet to be completed) and must fail-over to the standby database? This will be discussed in a moment.

Database Replication For High Availability

High availability in context of databases is accomplished by database replication from a primary to a secondary database. If the primary database becomes unavailable the applications switch over to the secondary database. Ideally, the replication is “perfect” in the sense that every successful transaction is transmitted from the primary database to the secondary database instantaneously. If that is possible, no transaction would be lost after the fail-over from the primary to the secondary database.

Oracle provides the Active Data Guard [https://docs.oracle.com/database/121/SBYDB/concepts.htm#SBYDB00010] functionality for replication. It replicates data from the primary (“active”) database to the secondary (“standby”) database by means of different replication policies:

  • Maximum Availability
  • Maximum Performance
  • Maximum Protection

Maximum Protection actually achieves perfect replication. However, this relies on a synchronous communication of the primary database with the secondary database. As soon as this encounters problems like a significant delay or unavailability, the policy can change dynamically to Maximum Availability. In this case the primary database communicates asynchronously with the secondary database and in case of a failure in this situation data loss can happen if not all data was replicated from the primary database to the secondary database at the time of the failure.

Software Architecture Execution in Context of Database Fail-over

From an architecture viewpoint two data consistency cases have to be considered in case of fail-over:

  • no data loss: committed transactions on the secondary (new primary) are the same as in the primary (new secondary). The application system sees the exact same database state and can continue processing (after the application reset its database connections to the new primary).
  • data loss: some committed transactions are not present in the new primary. The application sees transaction as of some time ago; this is like a time-shift into the past. However, this time shift does not create an inconsistency from a data perspective.

From the viewpoint of transactions, also two cases have to be considered:

  • individual consistent transactions: the new primary has a consistent database and application system state. The application system can continue processing in both cases: data loss as well as no data loss.
  • set-consistent transactions: this is the more complex case and requires a more differentiated discussion next.

Before a fail-over, a group of set-consistent transactions can be completely finished and a consistent application system state is reached. It is also possible that only some of the transactions in a group of set-consistent transactions are executed (in-flight group of set-consistent transactions).

When a fail-over occurs, then there are different consequences depending on the data loss situation:

  • no data loss: if not data is lost during fail-over, then the group of set-consistent transactions sees the same state as before and processing can continue without interruption on the new primary database.
  • data loss: if data loss occurred then a group of set-consistent transactions that was completed before the fail-over might be set back to an in-flight state. In this case the application system will have to reprocess the lost transactions after the fail-over. A previously in-flight set-consistent transactions group might be set back also and some of the transactions have to be reprocessed in this case also.
  • pre-group data loss: in the worst case the data loss might be complete from the viewpoint of a group of set-consistent transactions in the sense the all transactions of a group of set-consistent transactions are lost. In this case there is no indication that the group ever existed.

From an architecture viewpoint, the case of an in-flight group of set-consistent transactions is the relevant one to focus on. In this case it is important that the application realizes that there is such a group and continues processing after a fail-over. Aside from the committed transactions there needs to be a persistent flag implemented that indicates the progress made and which transactions were completed, and which not yet. This persistent flag then allows the application system to realize that it needs to continue processing those transactions of a group of set-consistent transactions that have not been executed yet.

Since transactions are executed one after another sequentially the persistent flag only needs to record the last successful transaction. And the persistent flag is written as part of that transaction. So each transaction writes a persistent flag that indicates the last committed transaction. An application system can select all persistent flags after a fail-over and determines the transactions that need to be executed next (in this sense the application system is re-entrant).

If the data loss is pre-group, then there is no record of the group having ever existed. In this case the group will not be (re-)started on the new primary as there is no record of it. The application system either relies on user input (like in the above example: the user is irritated and starts the checkout process again) or, alternatively, a search can be performed looking for locations where a group can possibly started and that then would allow to prompt for user confirmation that a group of set-consistent transactions needs to be restarted (or not). For example, the user could be asked if the checkout process should be started.

Summary

If individual or set-consistent transactions rely solely on database transactions and on resource managers that are implemented in a database system (like Oracle) then it is possible to implement a system architecture to support fail-over in such a way that application systems can continue processing and deal with possibly lost transactions after fail-over. The reason for this is that application systems always encounter a consistent application system state and take this as the basis for continued processing after a fail-over.

Go SQL!

Appendix: Taxonomy

The software architecture taxonomy relevant for database fail-over can be built based on the combinations of resource manager types used. In the following the various combinations are discussion on a high level (“x” means that the software architecture uses one or more of the indicated resource manager types).

Software Architecture Transactional Persistent Non-transactional Persistent Non-transactional and Non-persistent and rebuildable Non-transactional and Non-persistent and Non-rebuildable
Consistent x
Consistent x x
Possibly consistent x x
Possibly consistent x x x
Possibly consistent x x x x

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

 

Software Design in Context of Database Failover (Part 2): Software Architecture Taxonomy

In order to discuss software design in context of database fail-over I will present a taxonomy of software architectures that manage data. This is the basis for discussing all possible database fail-over cases and their specific issues. Let’s go.

Motivation

In the previous blog the ideal database software architecture (“Happy Path”; [https://realprogrammer.wordpress.com/2015/07/30/software-design-in-context-of-database-failover-part-1-the-happy-path/]) was discussed; why is this not sufficient to cover all types of software architectures? The reason lies in two main aspects:

  • Asynchronous replication mode. Ideally, replication from a primary database to a secondary database is synchronous and complete, meaning, the primary and secondary database are an exact replica of each other at every point in time (aka, at every transaction commit). However, this is not always possible due to network latency, network failures, server outages, etc. It is more often than not the case that the secondary database is lagging behind the primary database (if only for a few transactions) and is missing transactions when a failure occurs in such a situation. In case of a fail-over in such a situation applications will not see committed data in the secondary database that they have already seen committed in the primary database (a data loss occurred – or better: transaction loss).
  • Non-transactional resource management. Many applications manage data outside the scope of transactions. In case of failures these non-transactional data can be out of sync with the transactional data (e.g., primary database). And in case of data loss during a fail-over the non-transactional data might be inconsistent with the state of the secondary database also. Any replication mechanism between databases will not include non-transactional data that is managed outside the database.

A background discussion on resource managers and transactions follows now as preparation for the software architecture taxonomy discussion.

Background: Resource Manager

A resource manager in context of database software architectures is a component that manages state (data). In general, this is data creation, update, delete and read. For example, a database is a resource manager, and so is a file system. Resource managers are

  • Transactional or non-transactional. Transactional resource managers are able to participate in a transaction protocol (e.g., local transactions, distributed transactions, autonomous transactions or chained transactions, to name a few).
  • Persistent or non-persistent. For example, a database is (usually) persistent, a software cache is (usually) non-persistent.
  • Non-persistent rebuildable or non-persistent not rebuildable. A non-persistent resource manager might be able to rebuild data from persistently stored data.

For example, the file system on Linux is non-transactional, Oracle’s DBFS is transactional [http://docs.oracle.com/cd/E11882_01/appdev.112/e18294/adlob_fs.htm#ADLOB45943].

Resource Managers Are Everywhere

Resource managers are everywhere in a software architecture in general. Not only databases or file systems are resource managers, but also queuing systems, software caches, user interfaces, Java JVM caches, or Java class variables. Everything that manages state is to be considered a resource manager in context of the database fail-over discussion.

Most resource managers are non-transactional (they cannot participate in a transaction protocol), and many are non-persistent. An argument can be made that all resource managers could/should participate in transactions to avoid inconsistencies anywhere in the software stack (transactional memory would be an interesting tool in this regard). That would be ideal as any data state change would be consistent across all resource managers.

Resource Managers and Transactions

However, in reality, this is not the case: most resource managers are non-transactional and consequently the data that they manage are not under transaction control. In reality, access to resource managers happens within and outside transactions and not all resource managers can participate in a transaction protocol.

Worse, the same data item might be transactionally committed, and additionally stored in a non-transactional resource manager as a duplicate. In non-failure cases this works out fine, but sometimes, when failures occur, only one or the other update happens, leaving an inconsistent system state behind, as the data item is only in one location and not duplicated in others as expected. If an assumption is made in the software architecture that the data in all resource managers is always consistent, then this situation might cause significant system degradation or errors when failures occur. In the worst case inconsistent application state is left behind.

The same situation occurs if the application writes two data items to be consistent, on through a transactional resource manager, and one through a non-transactional resource manager. If a failure occurs after the first update, the second will not happen and the state is inconsistent.

System Semantics: Consistency

For the purposes of this blog the system semantics wrt. state consistency can be reduced to simply two cases:

  • Consistent system: The overall system is consistent when all data in resource managers are consistent with each other (or can be made consistent after a failure before resuming processing)
  • Inconsistent system: Otherwise.

The following system consistency guarantees can be given.

  • Consistent
    • Only transactional and persistent resource managers are used and all changes happen in context of a transaction protocol. This is the Happy Path case from the previous blog as the complete state is consistent at all times.
    • If additionally non-persistent and rebuildable resource managers are used, a consistent state can be re-created by removing all data from the non-persistent and rebuildable resource managers.
  • Possibly consistent
    • Transactional and non-transactional persistent resource managers are used. This might result in consistent state changes, or might not, depending on the timing of the failure. Because of the persistent nature it might be non-trivial to clean-up persistent inconsistent state and possibly complex compensation functions have to be developed (if at all possible) to clean up the inconsistent state.
    • Non-persistent and non-rebuildable resource managers are used in conjunction with persistent resource managers. Like before, this might be non-trivial to clean-up and an inconsistent state might be left behind.

If several transactional resource managers are used to change state consistently, but cannot be invoked in context of a single transaction in one transaction protocol, then any failure might leave an inconsistent state behind. This must be called out explicitly here in order to avoid the assumption that transactional means consistent in itself.

Taxonomy

The software architecture taxonomy relevant for database fail-over can be built based on the combinations of resource manager types used. In the following the various combinations are discussion on a high level (X means that the software architecture uses one or more of the indicated resource manager types).

Software Architecture Transactional Persistent Non-transactional Persistent Non-transactional and Non-persistent and rebuildable Non-transactional and Non-persistent and Non-rebuildable
Consistent x
Consistent x x
Possibly consistent x
Possibly consistent x x
Possibly consistent x x
Possibly consistent x x x
Possibly consistent x x x x

Each of these will be discussed separately in an upcoming blog.

Fail-over

In terms of database fail-over, only those software architectures can fail-over without any state inconsistency that can maintain a consistent state across all data or can rebuild a consistent state after a failure. The reason for this is that only consistent state is replicated and any state outside the database is consistent also.

Summary

The rationalization in this blog shows that software applications will have to deal with inconsistent data states once non-transactional resource managers are used to manage data. In context of fail-over this becomes an even bigger problem and the next blogs will discuss strategies to deal with data state inconsistencies in software architectures.

This was a high-level discussion only and future blogs will discuss each software architecture in more detail to shine a light on the important and interesting details in case of database fail-over.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

 

Software Design in Context of Database Failover (Part 1): The Happy Path

Oracle Active Data Guard provides a replication and fail-over mechanism between two or more Oracle databases for high availability and disaster recovery. How does (application) software have to be designed in this context?

Oracle Active Data Guard

In a nutshell, Oracle Active Data Guard [http://www.oracle.com/us/products/database/options/active-data-guard/overview/index.html] provides synchronization between a primary database and one or more secondary databases. It continuously replicates the primary database to all secondary databases based on in-memory redo logs and their transmission. There are several protection modes, which are variations of synchronous or asynchronous replication between the primary database and the secondary databases. Secondary databases can be in the same data center (high availability) or different data centers (disaster recovery) and the specific deployment topology depends on the degree of availability and protection the organization implements.

Because the primary database is replicated to secondary databases, clients accessing the primary database can fail-over to one of the secondary databases if the primary database becomes unavailable. Clients therefore switch over from the primary database to the secondary database for processing. Initiating the fail-over process can be manually or automatically triggered, depending on the configuration chosen. In case of synchronous replication the secondary databases are in lockstep with the primary database and contain the exact same state. In case of asynchronous replication a minor delay might be possible between the primary database and its secondary databases and the secondary databases might lag a few transactions behind; their state is therefore not the same but consistent as of a earlier time.

Synchronous Replication

Active Data Guards’ synchronous replication ensures that the secondary databases are an exact replicas of the primary database. In terms of transaction execution this means that a transaction is committed successfully on the primary database only if it is committed on all secondary databases as well. The benefit of exact replication (aka, no data loss) comes with the cost of transaction execution time as a transaction on the primary database only succeeds after the secondary databases have executed the same transaction. The primary database is therefore dependent on the execution behavior of the secondary databases.

Fail-over Scenario

In the following a partial site failure is assumed. This means that the primary database becomes unavailable, however, the applications that are accessing the primary database continue to be available. A (complete) site failure occurs if all systems, primary database and all applications, are unavailable.

The following shows the various components (a primary database, a secondary database, and one application). The solid arcs represent the call relationships before a fail-over to the secondary database, the dashed arcs represent the call relationships after a fail-over to the secondary database.

failover

Software Design for “Happy Path”

In this blog the “Happy Path” is assumed. This means that all database processing that the application performs is done in individual complete OLTP transactions that always commit consistent transactions. After each transaction the database is in a consistent state.

In the case of complete OLTP transactions it is assumed that no data is cached in application layer caches or stored in other non-transactional resource managers, like file systems or queuing systems. This style of OLTP transaction processing application serves as a design baseline for this blog and the future blogs in this series.

For the complete OLTP transaction applications no specific software design rules apply for the fail-over scenario outlined above (except for not using any other non-transactionl resource managers like caches, queues or file systems). Since all transactions are committing consistent data, and since the replication mode between primary database and secondary database is synchronous, the application will find a consistent and complete database state in the secondary database after a fail-over.

Furthermore, Oracle Active Data Guard provides driver-level functionality that allows an application to fail-over without having to implement application level code that accomplishes the fail-over; instead, the drivers are able to fail-over without the application having to be aware of it.

Summary

For complete OLTP processing no specific software design approaches or software design rules have to be taken into consideration when using the synchronous replication mode in context of Oracle Active Data Guard.

However, the “Happy Path” is not the only possible application type and more complex application software systems in conjunction with other Oracle Active Data Guard replication modes will be discussed here in forthcoming blogs, especially when non-transactional resource managers are involved.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – In-Memory Option (Part 3): JSON Support

It is quite natural to view Oracle’s In-Memory Option in context of the relational model; however, the In-Memory Option supports the JSON model at the same time as well.

Oracle JSON Support

As shown earlier in this blog, the Oracle 12c database supports JSON natively [http://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246] and incorporates JSON access into SQL so that JSON structures can be accessed through SQL directly. In addition, in a single SQL statement, JSON structures as well as relational tables can be accessed at the same time. Please see the documentation and the other blog entries for more details.

Oracle In-Memory Option

The In-Memory Option [http://www.oracle.com/us/corporate/features/database-in-memory-option/] was introduced in the previous two blog entries and those discussed a relational example as well as queries that allow to introspect the meta data and status of the In-Memory Option processing. It gave an overview as well as the rationale why using the In-Memory Option for analytic queries is advantageous over regular relational processing (columnar representation of data in main memory).

Combination: JSON processing using In-Memory Option

Analytics in context of JSON structures would benefit from In-Memory Option support as well. Many applications based on JSON structures usually have to support some form of analytics. The In-Memory Option supports aggregation functionality on JSON structures as well as on relational tables.

In the following, an example is introduced that creates a table containing a JSON structure, enables it for In-Memory Option support and shows a few simple analytics queries.

Example Table with JSON Structure

The following declares a sample table with a column containing a JSON structure.

DROP TABLE js_players;
CREATE TABLE js_players
 (
   player_id NUMBER NOT NULL,
   player_name VARCHAR(255),
   games VARCHAR(4000) 
    CONSTRAINT games_ensure_json
    CHECK (games IS JSON (STRICT WITH UNIQUE KEYS)));

Table Population

In order to have a large data set the following block creates rows containing JSON structures and inserting them into the above declared table.

DECLARE
  counter NUMBER;
  player_name VARCHAR(128);
  games_value VARCHAR (256);
BEGIN
  counter := 0;
  LOOP
    counter := counter + 1;
    player_name := 'Divvon' || '_' || counter;
    games_value := '{ "games":
      [{"name":"tic-tac-toe", 
        "points":' || counter *10 || '},
       {"name":"one-two-three", 
        "points":' || counter * 100 || '}
      ]}';
    INSERT INTO js_players VALUES
      (counter, player_name, games_value);
  EXIT
    WHEN counter = 1000000;
  END LOOP;
  COMMIT;
  RETURN;
END;

An example JSON document looks like this:

{"games":
  [{"name":"tic-tac-toe", "points":750},
   {"name":"one-two-three", "points":7500}
  ]}

Enabling In-Memory Option

This statement enables the In-Memory Option for the table containing JSON structures:

ALTER TABLE js_players inmemory priority critical;

Aggregation Queries

The following queries show aggregation over elements of the JSON structures. Each query extracts all ‘points’ values from the JSON structure (since every document might have several ‘points’ fields in the array of ‘games’) and makes is accessible as ‘pts.points’. Then it aggregates over this structure. The first query aggregates over all rows, whereas the second query is selecting only a few rows based on ‘player_name’.

SELECT DISTINCT p.player_name,
       SUM(pts.points) sum_points,
       MIN(pts.points) min_points,
       MAX(pts.points) max_points,
       AVG(pts.points) avg_points,
       COUNT(*)
FROM js_players p,
     json_table(p.games, 
                '$.games[*]' COLUMNS (points VARCHAR2(32 CHAR) PATH '$.points')) 
         AS pts
GROUP BY p.player_name
ORDER BY sum_points DESC;

01_blog

SELECT DISTINCT p.player_name,
       SUM(pts.points) sum_points,
       MIN(pts.points) min_points,
       MAX(pts.points) max_points,
       AVG(pts.points) avg_points,
       COUNT(*)
FROM js_players p,
     json_table(p.games, 
                '$.games[*]' COLUMNS (points VARCHAR2(32 CHAR) PATH '$.points')) 
       AS pts
WHERE p.player_name = 'Divvon_204'
      OR p.player_name = 'Divvon_206'
GROUP BY p.player_name
ORDER BY sum_points DESC;

02_blog

Summary

Oracle’s In-Memory Option, because it not only supports the relational model, but also the JSON model, is an interesting alternative to analytics in context of JSON data sets. This blog has shown an example of how to combine JSON structures and the In-Memory Option in order to be able to run analytics queries.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – In-Memory Option (Part 2)

Part 1 was introducing an Oracle In-Memory Option example; in Part 2 I’ll introduce a few queries that show meta data and statistics in context of the Oracle In-Memory Option.

Table Status

One interesting set of metadata is the table’s in-memory status as well as the compression and loading priority. A table with enabled in-memory option is called “in-memory table” in the following for short. The following query retrieves these metadata for the in-memory table “IM_PLAYERS”:

SELECT table_name,
       inmemory,
       inmemory_compression,
       inmemory_priority
FROM dba_tables
WHERE table_name = 'IM_PLAYERS';

01

Loading of in-memory tables takes some time on initial load and their load status is important. Only when loaded the full capability of the in-memory functionality is available. The following query shows the load status by retrieving the number of bytes not populated into the in-memory area yet:

SELECT v.segment_name name,
       v.populate_status populate_status,
       v.bytes_not_populated bytes_not_populated
FROM v$im_segments v
ORDER BY 1;

02

The size of an in-memory table is important in order to determine if the memory allocation for the in-memory option is sufficient. The following query fetches various sizes:

SELECT v.owner,
       v.segment_name,
       v.populate_status,
       ROUND(v.bytes / 1000 / 1000, 2) 
           AS size_in_mb,
       ROUND(v.inmemory_size / 1000 / 1000, 2) 
           AS inmemory_size_in_mb,
       ROUND(v.inmemory_size / v.bytes, 2) * 100 
           AS im_size_percent
FROM v$im_segments v;

03

Basic Usage

When running queries against in-memory tables the in-memory option might or might not be triggered and used depending on the query optimizer’s decisions. The following two statements executed together show if the in-memory option was used:

EXEC dbms_feature_usage_internal.
         exec_db_usage_sampling(SYSDATE);
SELECT ul.name,
       ul.detected_usages
FROM dba_feature_usage_statistics ul
WHERE ul.version =
          (SELECT MAX(u2.version)
           FROM dba_feature_usage_statistics u2
           WHERE u2.name = ul.name
                 AND ul.name LIKE 'In-%');

04

A query to an in-memory table advances the value of detected_usages if the in-memory representation of that table was in fact accessed.

Statistics

If a query triggers the use of the columnar representation managed by the in-memory option it is important to understand the specific statistics to see how much optimization took place. The following query retrieves some of all available statistics:

SELECT t.display_name,
       s.value,
       s.sid,
       t.statistic#
FROM v$statname t, v$mystat s
WHERE t.display_name IN ('IM scan CUs columns accessed', 
                         'IM scan segments minmax eligible', 
                         'IM scan CUs pruned')
      AND s.statistic# = t.statistic#
ORDER BY s.sid DESC;

05

Being able to retrieve the statistics allows now to see the statistics before and after query execution. In the next section a selective as well as aggregate query is run and the statistics examined before and afterwards.

Example

The first example is a selective query that retrieves all columns of two players (rows).

SELECT * 
FROM cb.im_players 
WHERE player_id = 10 OR player_id = 20;

06

First, the statistics is shown before the query is executed and then the statistics after the query is executed.

05

07

The second example is a non-selective query that aggregates over all rows.

SELECT player_name,
       COUNT(*),
       MIN(player_score),
       MAX(player_score),
       AVG(player_score),
       SUM(player_score)
FROM cb.im_players
GROUP BY player_name;

08

Again, first the statistics before and after the query execution is shown.

07

09

Summary

This blog introduced a series of queries that allow to retrieve metadata as well as statistics on query execution in context of the Oracle In-Memory Option. The nice part is that these are regular SQL queries that can be run manually, but also as part of development or operations tools. It is therefore possible to examine the use of the in-memory tables and their behavior in a running application.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – In-Memory Option (Part 1)

The name ‘In-Memory Option’ does not mean Oracle turned into a general main memory database or has a new option to do so; it means something else entirely.

Aggregation

Columnar data representation is geared towards aggregation of columns in relational databases. In order to make aggregation fast, the rows of a column should be ‘near’ to each other in terms of memory layout; the best would be if the rows of the column would be next to each other. Access for aggregation is very efficient with this layout. A database implementing this approach is called a columnar database (http://en.wikipedia.org/wiki/Column-oriented_DBMS).

Specialized databases exist that structure the data on disk as well as in memory in a columnar format. Once the data is in such a database, aggregation can be fast. However, the data has to be in the database and chances are that the same data is in a separate OLTP database also (a second copy). This then means that a data integration problem has to be solved in order to ensure that data that is managed in the OLTP database is also available (with sufficiently minor delay) in the columnar database (or columnar store). And, since it changes in the OLTP database, the integration effort is an ongoing process.

In-Memory Option

Oracle 12c’s In-Memory Option [http://www.oracle.com/us/corporate/features/database-in-memory-option/] combines the columnar storage with OLTP storage in the same database. The columnar storage layout is only implemented in main memory, leaving the disk layout as it is. This means that OLTP and columnar access is possible in the same database, concurrently, and the data is always 100% transactionally consistent, as no delayed data integration has to take place. The ‘artificial’ separation of database functionality into separate databases to satisfy aggregation processing requirements is not necessary anymore in such a database management system.

Example

Since Oracle 12c combines the relational and the columnar representation, using the columnar functionality is fully declarative without any operational management overhead.

In the following a small example shows how to make use of the Oracle 12c In-Memory Option concurrently to the regular OLTP load.

Assume a regular table that has been in place for a while and is accessed by a regular application (OLTP use case). This table manages data and is changed according to the application logic.

DROP TABLE im_players;
CREATE TABLE im_players
  (
   player_id NUMBER,
   player_name VARCHAR2(128),
   player_score NUMBER);

Here is a way to populate the table with random content for testing purposes:

DECLARE
  player_id NUMBER;
  player_score NUMBER;
  player_name VARCHAR(128);
BEGIN
  player_id := 0;
  LOOP
    player_id := player_id + 1;
    player_name := 'player_' || player_id;
    player_score := player_id * 10;
    INSERT INTO im_players VALUES
      (player_id, player_name, player_score);
    player_score := player_id * 100;
    INSERT INTO im_players VALUES
      (player_id, player_name, player_score);
    player_score := player_id * 1000;
    INSERT INTO im_players VALUES
      (player_id, player_name, player_score);
  EXIT
    WHEN player_id = 1000000;
  END LOOP;
  COMMIT;
  RETURN;
END;

A new requirement has to be fulfilled based on business needs. The requirement is to analyze the existing table regularly while it is being used by the OLTP application. In the following this requirement is implemented with the Oracle 12c In-Memory Option.

First, memory has to be set aside for the columnar representation of the data in main memory:

ALTER system SET inmemory_size = 300M scope=spfile;

Restarting the database enables this change and this query confirms the setting:

SELECT name, value FROM v$sga;

01

Now the database has a main memory area (In-Memory Area) set aside for the columnar representation of relational tables.

Next, the existing table definition is modified to additionally be represented in columnar representation:

ALTER TABLE im_players inmemory priority critical;

There are different options that indicate how critical it is to have the rows in columnar format, how to compress depending on the access model, and actually which columns should be in columnar format as not all columns are necessarily required.

The alter statement above indicates that it is critical for the data to be in columnar format. The database will diligently load the data into the columnar memory section. Compression is enabled for low query access. All columns are enabled, based on the use case.

The following query shows how much data is not yet populated. Loading data is a process that takes time and the query supports checking the load status. Once the loading is completed, zero bytes are not yet populated.

SELECT 
  v.segment_name name,
  v.populate_status populate_status,
  v.bytes_not_populated bytes_not_populated
FROM v$im_segments v
ORDER BY 1;

02

The following query performs an aggregation.

SELECT
  player_name,
  COUNT(*),
  MIN(player_score),
  MAX(player_score),
  AVG(player_score),
  SUM(player_score)
FROM im_players
GROUP BY player_name;

??03

03

The query plan of this query shows that the columnar representation is being accessed.

----------------------------------------------------------------------------------------
| Id | Operation                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |            | 3025K | 227M  | 216 (38)   | 00:00:01 |
|  1 | HASH GROUP BY              |            | 3025K | 227M  | 216 (38)   | 00:00:01 |
|  2 | TABLE ACCESS INMEMORY FULL | IM_PLAYERS | 3025K | 227M  | 141 (5)    | 00:00:01 |
----------------------------------------------------------------------------------------

And this query shows access metrics indicating that the query actually executed against the columnar representation. The values after executing the aggregation query are different from the values before the aggregation.

Note on SQL

An important aspect of the Oracle 12c In-Memory Option is that the SQL specifying the required aggregation is regular SQL and no new or different language constructs had to be introduced. The underlying reason is that the query optimizer has sufficient knowledge to make the appropriate access choices based on the SQL presented and does not require hints from users. This is a huge engineering benefit since no new or different language has to be learned to make use of the columnar representation.

Summary

This was a super-brief introduction into the Oracle 12c In-Memory Option. The documentation has of course a lot more background [http://docs.oracle.com/database/121/ADMIN/memory.htm#ADMIN14257]. In addition, a nice blog series exists that introduces the various aspects step-by-step [https://blogs.oracle.com/In-Memory/entry/getting_started_with_oracle_database].

From a data management aspect an optimized representation targeted towards aggregation is enabled declaratively in context of the same database that holds the OLTP data. This is a huge benefit as one database supports the mixed workload and any data integration setup between different databases becomes unnecessary. With significantly increasing main memory sizes this option allows freeing up engineering cycles as well as operational resources while increasing data consistency and processing performance.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 7): Creating JSON from Relational Data

Up to now the examples of previous blogs inserted JSON data into JSON enabled columns of tables. This blog shows how JSON data can be created or derived automatically, and, more importantly, how relational data can be transformed and represented automatically as hierarchical JSON data.

Virtual Columns

A virtual column of a table in Oracle is defined by referring to a function invocation. The value of the column is the function invocation result. Parameters (column values of the same table) can be passed to the function and so the functionality of the function can be dependent on the data content of the row.

A simple example is shown in the following. A virtual column is created that contains a name – value pair based on a non-virtual column of the same table (the function is string concatenation).

DROP TABLE vc;
CREATE TABLE vc
  (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS (
     '"name"' || ':' || '"' || name || '"' ));
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
SELECT * FROM vc;

1

Value Column with JSON Structure

The following example shows a similar example, however, in this case the virtual column is a JSON column and the contents is a JSON object containing the name – value pair. The table definition ensures that the column is a JSON column and that it can be accessed with the JSON functions.

DROP TABLE vc;
CREATE TABLE vc
 (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS (
     '{' || '"name"' || ':' || '"' || name || '"' || '}') 
     virtual CONSTRAINT value_ensure_json 
     CHECK (value IS JSON (STRICT WITH UNIQUE KEYS)) );
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
SELECT * FROM vc;

2

Separate Function creating JSON Structure

Functions can be created separately as first class citizens in the database (for the purpose of proper engineering practices as well as functionality reuse); they do not necessarily have to be inline invocations. The following example creates a separate function that creates a JSON object containing a name – value pair.

CREATE OR REPLACE FUNCTION 
  to_JSON(in_name IN VARCHAR2,
          in_value IN VARCHAR2)
  RETURN VARCHAR2 deterministic
  IS
    JSON_value VARCHAR(255);
  BEGIN
    JSON_value := 
      '{' || '"' || in_name || '"' || ':' || 
             '"' || in_value || '"' || '}';
  RETURN JSON_value;
END;
DROP TABLE vc;
CREATE TABLE vc
 (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS (
     to_JSON('name', name)) 
     virtual CONSTRAINT value_ensure_json 
     CHECK (value IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
SELECT * FROM vc;

3

Function with Selection

Functions have the full range of SQL available and can execute queries in their implementation. The following example queries a second table and creates a JSON object from the result. Note that the query depends on the input parameter. 

CREATE OR REPLACE FUNCTION 
  from_dep(in_value IN VARCHAR2)
  RETURN VARCHAR2 deterministic
IS
  JSON_value VARCHAR(255);
  dep_value  VARCHAR(255);
BEGIN
  EXECUTE IMMEDIATE 'SELECT ' || ' adr ' || 
                    ' FROM ' || ' dep ' 
                    INTO dep_value;
  JSON_value := '{' || '"' || in_value || '"' || ':' || 
                       '"' || dep_value || '"' || '}';
  RETURN JSON_value;
END;
DROP TABLE vc;
CREATE TABLE vc
 (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS 
     (from_dep(name)) 
     virtual CONSTRAINT value_ensure_json 
     CHECK (value IS JSON (STRICT WITH UNIQUE KEYS)) );
DROP TABLE dep;
CREATE TABLE dep
 (
    adr VARCHAR(255));
INSERT INTO dep VALUES ('yahoo');
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
SELECT * FROM vc;

4

Hierarchical JSON Column

The last example shows how a function can create complex hierarchical JSON structures. This allows for denormalizing a relational model into a JSON document representation as needed. This is the use case where a client requires a self-contained JSON object that contains all necessary data so that the client does not have to perform repeated queries to obtain all data.

CREATE OR REPLACE FUNCTION 
  from_dep(v_id IN NUMBER)
  RETURN VARCHAR2 deterministic
IS
  JSON_value VARCHAR(255);
  dep_value  VARCHAR(255);
  CURSOR dep_cur
    IS
      SELECT * FROM dep d WHERE d.vc = v_id;
  l_dep_adr dep%ROWTYPE;
  is_first NUMBER;
BEGIN
  is_first := 0;
  JSON_value := '[';
  OPEN dep_cur;
  LOOP
    FETCH dep_cur INTO l_dep_adr;
    EXIT
      WHEN dep_cur%NOTFOUND;
    IF (is_first = 0) 
    THEN
      JSON_value := JSON_value || 
        '{' || '"' || 'adr' || '"' || ':' || 
               '"' || l_dep_adr.adr || '"' || '}';
      is_first := 1;
    ELSE
      JSON_value := JSON_value || ',' || 
        '{' || '"' || 'adr' || '"' || ':' || 
               '"' || l_dep_adr.adr || '"' || '}';
    END IF;
  END LOOP;
  CLOSE dep_cur;
  JSON_value := JSON_value || ']';
  RETURN JSON_value;
END;
DROP TABLE vc;
CREATE TABLE vc
 (
   id NUMBER NOT NULL,
   name VARCHAR(255),
   value generated always AS 
     (from_dep(id)) 
     virtual CONSTRAINT value_ensure_json 
     CHECK (value IS JSON (STRICT WITH UNIQUE KEYS)) );
DROP TABLE dep;
CREATE TABLE dep
 (
   vc NUMBER, adr VARCHAR(255));
INSERT INTO vc (id, name) VALUES (1, 'abc');
INSERT INTO vc (id, name) VALUES (2, 'def');
INSERT INTO dep VALUES (1, 'yahoo');
INSERT INTO dep VALUES (2, 'yahaa');
INSERT INTO dep VALUES (2, 'yahee');
SELECT * FROM vc;

5

Summary

Virtual columns are an interesting mechanism when it comes to denormalizing a relational data model into a hierarchical JSON document model. This blog gave some examples upon which more complex functionality can be built. It shows that Oracle 12c can not only query JSON, but also generate JSON structures from relational data supporting a large array of use cases.

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.