Oracle 12c – SQL for JSON (Part 2): Basic Queries

This blog provides a small tour of basic SQL queries that operate on JSON in context of the Oracle Database 12c Release 1 (12.1.0.2.0).

Sample Data Set

The following very basic data set is used as an example in this blog. It is kept simple in order to be able to focus on the queries without having to deal with complex JSON objects at the same time.

First, a simple table is created that contains a JSON column. Next, some rows containing JSON objects are inserted.

DROP TABLE demo;
CREATE TABLE demo
( id NUMBER,
  player CLOB 
    CONSTRAINT player_ensure_json 
      CHECK (player IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO demo 
VALUES (1, '{"person": "Bob", "score": 10}');
INSERT INTO demo 
VALUES (2, '{"person": "Bob", "score": 20}');
INSERT INTO demo 
VALUES (3, '{"person": "Jake", "score": 100}');
INSERT INTO demo 
VALUES (4, '{"person": "Jake", "score": 200}');
INSERT INTO demo 
VALUES (5, '{"person": "Alice", "score": 1000}');

With the sample data set in place, we can now construct a complex query in several steps.

Selection and Projection

The most basic query selecting the complete data set is

SELECT * FROM demo d;

A basic projection extracting only the person from the JSON objects is

SELECT d.player.person FROM demo d;

A basic selection restricting the JSON objects is

SELECT d.player.person
FROM demo d
WHERE d.player.person IN ('Jake', 'Bob');

The syntax for accessing properties in JSON objects is in principle

<table alias>.<JSON column>.<path to JSON object key>

with variations on JSON array index references if required (http://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246).

A more complex selection with an additional restriction is

SELECT d.player.person
FROM demo d
WHERE d.player.score > 0
  AND d.player.person IN ('Jake', 'Bob');

Ordering

Results can be ordered, for example, in the following way

SELECT d.player.person
FROM demo d
WHERE d.player.score > 0
AND d.player.person IN ('Jake', 'Bob');
ORDER BY d.player.person DESC;

Grouping

Results can be grouped also as a preparation for aggregation

SELECT d.player.person
FROM demo d
WHERE d.player.score > 0
  AND d.player.person IN ('Jake', 'Bob');
GROUP BY d.player.person
ORDER BY d.player.person DESC;

Aggregation

Different aggregation functions can be used to do some basic analysis

SELECT d.player.person,
  SUM(d.player.score),
  AVG(d.player.score),
  MIN(d.player.score),
  COUNT(*)
FROM demo d
WHERE d.player.score > 0
  AND d.player.person IN ('Jake', 'Bob');
GROUP BY d.player.person
ORDER BY d.player.person DESC;

Final Result

The final result is show here in table representation (copied from SQLDeveloper)

result

Inspiration

This example was inspired, in fact, by http://www.querymongo.com. There, the MySQL Query

sql_query

is translated to one of MongoDB’s query interfaces to

mongo_query

(web site accessed on 10/21/2014).

Summary

In summary, SQL functionality is available not only for the relational model in the Oracle Database 12c, but also for JSON-based data.

This makes the Oracle database a quite powerful JSON processing environment as querying JSON data is possible through the declarative SQL language.

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 1): Introduction to Native Support

The Oracle Database 12c Release 1 (12.1.0.2.0) [http://www.oracle.com/technetwork/database/enterprise-edition/overview/index.html] introduces native JSON support and SQL access to JSON. This blog post gives a first introduction.

SQL access to JSON: Native Support

SQL access to JSON is a significant development in itself, but native support in context of a relational database it is actually quite huge and exciting. This blog post series will provide a discussion of the various aspects over several installments, including the mixed use of SQL on JSON with SQL on relational tables.

But first things first.

2-Second Overview

This is a 2-second overview showing how to create a table that can store JSON data, how to insert a row containing a JSON object and how to query it with a simple query.

CREATE TABLE supplier
( id NUMBER NOT NULL
    CONSTRAINT supplier_pk PRIMARY KEY,
  supplier_doc CLOB
    CONSTRAINT supplier_doc_ensure_json 
      CHECK (supplier_doc IS JSON));
INSERT INTO supplier
VALUES (125,
'{
  supplierId: 125,
  "supplierName": "FastSupplier"}');
SELECT * FROM supplier;
        ID SUPPLIER_DOC
---------- -------------------------------------------------
       125 {supplierId: 125, "supplierName": "FastSupplier"}

That was easy 🙂

Creating a Table storing JSON

JSON data are stored in columns of regular tables. A constraint placed on a JSON column enforces JSON compliance:

CREATE TABLE supplier
( id NUMBER NOT NULL
    CONSTRAINT supplier_pk PRIMARY KEY,
  supplier_doc CLOB
    CONSTRAINT supplier_doc_ensure_json 
      CHECK (supplier_doc IS JSON));

Any attempt to insert invalid JSON data fails because of this constraint. Other columns are regular relational columns and they can be defined and constrained as necessary.

A JSON column stores JSON objects as well as JSON arrays as both are valid top-level JSON structures. Trying to insert scalars will fail. The following two insert statements are valid:

INSERT INTO supplier
VALUES (125,
'{
  supplierId: 125,
  "supplierName": "FastSupplier"}');
INSERT INTO supplier
VALUES (128,
'["empty_list_of_supplier"]');

While the JSON [json.org] standard allows duplicate members (aka, keys), many implementations only tolerate or even outright reject it. To avoid storing JSON objects with duplicate keys, the constraint of a JSON column can be extended:

CREATE TABLE supplier
( id number NOT NULL
    CONSTRAINT supplier_pk PRIMARY KEY,
  supplier_doc CLOB
    CONSTRAINT supplier_doc_ensure_json 
      CHECK (supplier_doc IS JSON (WITH UNIQUE KEYS)));

The following insert will fail with the additional constraint “WITH UNIQUE KEYS” (because of a duplicate keys), but would succeed otherwise.

INSERT INTO supplier
VALUES (126,
'{
  "supplierId": 126,
  "supplierName": "FastSupplier",
  "supplierName": "FS"}');

JSON has a defined syntax, however, many implementations relax it by e.g. allowing to state member names without quotes. To enforce a strict syntax, the constraint on a JSON table can be extended:

CREATE TABLE supplier
( id number NOT NULL
    CONSTRAINT supplier_pk PRIMARY KEY,
  supplier_doc CLOB
    CONSTRAINT supplier_doc_ensure_json 
      CHECK (supplier_doc IS JSON (STRICT WITH UNIQUE KEYS))

The following insert will fail (because one key is not enclosed in quotes), but succeed without the “STRICT” constraint:

INSERT INTO supplier
VALUES (125,
'{
  supplierId: 125,
  "supplierName": "FastSupplier"}');

With the various constraints and their combinations it is possible to restrict the flavor of JSON that is stored in the database. From an architecture perspective this means that the database can be the central point of enforcing conformity.

Since JSON data is stored in colums it is possible to create several columns in a table that contain JSON data. Here is an example of a table with two columns:

CREATE TABLE supplier
( id number NOT NULL
    CONSTRAINT supplier_pk PRIMARY KEY,
  supplier_doc CLOB
    CONSTRAINT supplier_doc_ensure_json 
      CHECK (supplier_doc IS JSON (STRICT WITH UNIQUE KEYS)),
  history_doc CLOB
    CONSTRAINT history_doc_ensure_json 
      CHECK (history_doc IS JSON (STRICT WITH UNIQUE KEYS)));

From a data modeling perspective this opens up a whole new dimension. A brief discussion follows later in this blog post.

Inserting JSON into a Table

Inserting JSON data into a table that has at least one JSON column is rather straight forward, as the previous examples have shown.

The insert statement must have a valid JSON object or JSON array in the position of the JSON column(s), or SQL NULL as the value of a JSON column can be unknown.

Querying JSON with SQL

Oracle Database 12c supports a number of functions to query and to manipulate JSON data. In the following only a first impression is given and the full set will be discussed in additional separate blog posts.

The most basic query was already introduced:

SELECT * FROM supplier;

Selecting scalars from JSON:

SELECT s.supplier_doc.supplierName FROM supplier s;

The select clause is what one would expect: the table name followed by the column name followed by the key name using dot notation. The return value is a table with a single column containing strings.

This query selects scalar value across different columns:

SELECT s.id, s.supplier_doc.supplierName FROM supplier s;

In order to be able to construct more interesting queries, a more complex JSON object is used (see the end of this blog post for its details – it is not included here as it is quite large).

Querying a JSON object looks like this:

SELECT s.supplier_doc.businessAddress FROM supplier s;

This returns a table with one column containing a string representing a JSON object. For those objects that do not have a ‘businessAddress’, a SQL null is returned.

Querying a JSON array is done like this:

select s.supplier_doc.shippers from supplier s;

Like previously, the dot-notation path leads to the JSON member that contains an array as value.

The queries included so far give a first impression of how to query JSON data in Oracle 12c. Upcoming blog posts will provide much more details on the query capabilities, amongst other discussions.

Table Design

With the introduction of JSON it is now possible to have a data model design that combines relational and JSON data modeling instead of having just a relational data model or just a JSON data model. A single table supports the combination of relational and JSON data types.

Some important design questions are:

  • Should top level scalar data items be separate columns, or top-level keys in a JSON object, or both?
    • In the above examples, a primary key column ‘id’ was designed, as well as a key ‘supplierId’ in the JSON object. The ‘id’ column enforces a primary key and supports access to the supplier identifier without accessing the JSON document. The JSON document, however, contains the supplier identifier to be self-contained.
  • Should all data be in one JSON document or is it more appropriate to have separate JSON objects in different columns?
    • In one of the above table creation statements two JSON columns where defined, one containing the supplier data, and another one containing history about the supplier. This supports the separation of data that should not be combined, e.g., for security or privacy reasons. An internal supplier history is not part of the operational supplier data and should be kept separate.

More design principles and best practices will emerge over time in this context in addition to the ones mentioned here so far.

Documentation

The documentation can be found here: [http://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246]

Installation Notes

The Oracle Database 12c Release 1  (12.1.0.2.0) is available on Windows as well as Linux and Solaris at the time of this blog post [http://www.oracle.com/technetwork/database/enterprise-edition/downloads/index.html].

In case you want to run the Linux version on Windows, install Virtual Box (version 4.3.15 build 95923 (!) [https://forums.virtualbox.org/viewtopic.php?f=6&t=62615]) and then create a virtual machine with Oracle Linux 6. Once the VM is up and running, install the Oracle Database 12c Release 1 (12.1.0.2.0) in the VM and you are ready to go.

Example Large JSON Object

INSERT INTO supplier
VALUES (123,
'{
  "supplierId": 123",
  "supplierName": "FastSupplier",
  "rating": 5,
  "shippers": [{
    "shipperName": "TopSpeed",
    "address": {
      "street": "Sunrise",
      "streeNumber": 17,
      "city": "Sun City",
      "state": "CA",
      "zip": 12347
      }
    },
    {
    "shipperName": "PerfectPack",
    "address": {
      "street": "Solid Road",
      "streeNumber": 1771,
      "city": "Granite City",
      "state": "CA",
      "zip": 12348
      }
    },
    {
    "shipperName": "EconomicWay",
    "address": {
    "street": "Narrow Bridge",
    "streeNumber": 1999,
    "city": "Central City",
    "state": "CA",
    "zip": 12345
    }
  }],
  "businessAddress": {
    "street": "Main Street",
    "streeNumber": 25,
    "city": "Central City",
    "state": "CA",
    "zip": 12345
  },
  "warehouseAddress": {
    "street": "Industrial Street",
    "streeNumber": 2552, 
    "city": "Central City",
    "state": "CA",
    "zip": 12346
    }
  }');

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

NoSQL Databases: Data First, Schema Second? Or Vice Versa?

When using NoSQL databases, the notion of ‘schema’ enters the picture sooner or later. But when is the best time? And what to do about it?

Global vs. Local vs. Mixed Document Schema

One of the first distinctions is a global vs. local vs. mixed schema. A global schema is a schema that is defined for a given set of documents, e.g., a collection or table of documents. Every document in the collection must comply to the schema defined for the collection.

A local schema is a schema for a single document. Every document can have its own schema. It is possible that several documents follow the same schema. However, those are in general not grouped based on their schema.

A mixed schema is in part a global schema, and in part a local schema. This means that a document must contain certain properties as defined by the global schema, and the local schema allows additional schema elements on a per-document basis.

In terms of system examples:

  • Oracle NoSQL [http://www.oracle.com/us/products/database/nosql/] follows the local schema approach.
    • Documents in Oracle NoSQL are grouped by keys. Each document can have its own schema, aka, a local schema.
  • MongoDB [http://www.mongodb.org/] follows the mixed schema approach.
    • There is one property that must be present: ‘_id’ in all documents across all MongoDB collections and it must be unique. In this sense, MongoDB does not follow a pure local schema approach as one mandatory property is specified globally.
  • FoundationDB [https://foundationdb.com/] follows the global schema approach.
    • FoundationDB implements the concept of ‘table groups’ and supports query results to be serialized as JSON objects. However, from the viewpoint of the data model, it is relational and the hierarchical structure (aka, sub-documents) comes into play through foreign keys and SQL extensions that have been explored a long time ago in context of NF2 relations.
  • Oracle 12c [http://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6246] follows the local schema approach.
    • A JSON document is stored in a column of a table. That column is completely schema-free so that JSON documents of any schema can be stored and hence the schema is local.

Explicit vs. Implicit Document Schema

A second important distinction is an explicit vs. an implicit schema (or extensional vs. intentional schema). An extensional schema is defined through a schema representation format (e.g., Avro [http://avro.apache.org/]) or SQL-style DDL statements. An intentional schema does not have a separate representation, but it can be derived from the structure of a document instance (not always unambiguously).

In terms of system examples:

  • Oracle NoSQL: supports both, explicit and implicit schemas
  • MongoDB: implicit schema
  • FoundationDB: explicit schema
  • Oracle 12c: implicit schema

From a different perspective an implicit schema means that a document can be stored as it is without having to define a schema for it and without checking that it conforms to a schema. An explicit schema requires the schema to be defined and that documents are compliant before they can be stored successfully.

The ‘edge case’ is MongoDB that enforces the property ‘_id’ in every document. If it is not present, it will be automatically added. While MongoDB supports mainly an implicit schema, ‘_id’ is the exception.

Mandatory Document Schema

A third important distinction is the requirement for a mandatory schema before documents can be stored. If the database requires a schema, documents cannot be inserted before the schema is specified. If a schema is not mandatory, documents can be stored without having a schema in place.

In terms of system examples:

  • Oracle NoSQL: no mandatory schema
  • MongoDB: no mandatory schema
  • FoundationDB: requires a mandatory schema
  • Oracle 12c: no mandatory schema

Roles of Database Users

Even though one could get the impression, not everybody using a database in his or her role necessarily likes a document database where every document can have its own schema (schema-less, or better, schema-varying database). Some users do, some users don’t. For the sake of discussion, let’s distinguish two roles in this blog:

  • Data Collector. A data collector is a role for collecting data initially. A data collector determines interesting data to store and that data might or might not be used downstream for further processing. However, it is important for some data to be collected in case it becomes important down the road. And it is important to store data unmodified as the rules of modification (e.g., cleansing, transformation) might not be known at the time of the data collection.
  • Data User. A data user fundamentally applies all CRUD (create, read, update, delete) operations on a data set and in most cases through an application system that implements the business logic as well as the business rules. The data user is familiar with the business logic and the business rules in context of the application domain, like a financial application or a forecasting tool.

A data user can be a data collector also as the C (create) function creates data. In this case a data user can be a data collector as well.

Schema First or Schema Second?

From the viewpoint of the two roles, schema management plays an important role. In a black-and-white categorization, the two roles have the following desires:

  • Data Collector. A data collector’s goal is to collect relevant or potentially relevant data. Depending on the data sources, there might not be time or opportunity to define a schema first, and it might not be feasible to maintain a schema in the long run (including schema migration). A data collector therefore likes
    • local schema – implicit schema – not mandatory schema (‘schema second’)
  • Data User. A data user has to accomplish work and prefers clear-cut business rules and business functionality so that the application semantics is clear. A data user therefore likes
    • global schema – explicit schema – mandatory schema (‘schema first’)

The data collector stores data into a data collector database, and the database of the data user is called data user database. Both databases can be the same, or they can be two different databases, as discussed later.

A data collector, after having collected data, might want to query it for e.g. statistical and analytical purposes (e.g., how much data was collected, how many documents have a reference to a product description, etc.). So a data collector would prefer to have a schema after the collection of data in support of query formulation; so schema second.

A data user rarely operates on data directly, however, software engineers have to implement the business logic and business rules. Software engineers, for sure, enjoy certainty when it comes to a schema as variations cause significant code complexity in the general case; so schema first.

Co-existence of Schema First and Schema Second?

In the ideal case, both approaches, schema first and schema second are supported at the same time. This would make both roles happy and support their particular use cases. Let’s explore a few options:

  • ETL (extract – transform – load) from data collector database to data user database
    • This approach suggests an explicit transformation step that extracts data from the data collector database and adds it to the data user database. Along the way data type transformations can take place as well as handling of null values, absent properties, and other data modeling specifics. The extraction can be partial so that only relevant data are extracted. However, data duplication (at least partial) is one downside, amongst others.
  • View on data collector database
    • This approach creates a view through which access is provided on a single database (data collector and data user combined). A view could deal with the various transformation tasks; however, updates and deletions might be difficult or even impossible. So this is a potential solution only for the case of read access.
  • Automatic schema extraction
    • A schema can be derived from an document. It is therefore possible that for the data collector database the set of all possible schema can be made available to the data user. If the data user creates a super-set then a global schema is available. Of course, for a given document the schema only partially applies and the business functionality and business rules have to be aware of this. In this case also, the data collector and data user database are the same.
  • Intelligent ORM Layer
    • An ORM layer could provide the impression of a fixed document structure towards the data user, while being able to deal with the heterogeneous document schemas internally. If the ORM layer is flexible enough, it can provide updates as well as delete functionality, and if necessary, an extension mechanism to add custom code in order to make the delete or update functionality specific to the given document set. In this case also, the data collector and data user database are the same.

From an implementation perspective an ORM layer seems to be a practical approach as it allows to separate the transformation and update/delete logic from the application logic, while operating on a single database. However, every data access has to execute some transformation logic in general.

If space is of secondary concern or if the data set for the data user is a lot smaller than that of the data collector, the ETL approach might be preferable as the transformation logic is separate from the data access logic of the application systems.

Automatic schema extraction is certainly helpful in all cases as the schema has to be known in order to implement the ORM layer or the ETL component. If NoSQL databases start implementing a view mechanism then this might be preferable for read-only access situations.

Schema-Varying Languages?

It is, of course, tempting to ask if there is an easy and elegant way to deal with local schemas in application systems? Is is possible to write an application system that does not require a schema in the first place?

While this is a huge topic on its own, server-side JavaScript might be a good place to start as the language is not based on a class/instance paradigm, but prototype approach. It’s type system is almost equivalent with JSON. The language, therefore, is able to represent documents with local schema easily and effortlessly. Since JavaScript can introspect objects and since it implements the prototype mechanism it is possible to represent local functionality for documents with a local schema. One of the bigger questions is how to represent this flexibility to the end user on user interfaces in an ergonomic way.

However, this is a discussion on its own and I’ll save it for a later blog.

Summary

When is a good time to deal with the notion of ‘schema’ in NoSQL database projects? As the discussion has shown, different databases provide different schema support and different users look for schema support at different point in the data life cycle, if at all.

So unless schema is completely irrelevant in your project (and will stay irrelevant for sure), the discussion cannot start early enough because depending on its importance it might influence the database selection as well as the overall product architecture and implementation effort around schema maintenance and enforcement.

Trending: Multi-Interface and Multi-Data-Model Databases

An interesting development, especially in the NoSQL database space, is the development towards multi-interface and multi-data-model databases, and sometimes both at the same time. While it provides flexibility, it also brings challenges.

Multi-Data-Model Support

In the relational database space, supporting different data models concurrently is not a novelty. Relational databases started off with the relational data model implementation, and later on some of the systems extended the relational model mainly by objects, XML or JSON.

Some databases in the NoSQL space are starting to evolve, too, in this manner and are providing more than one data model concurrently. Some interesting examples are discussed next.

One example in the NoSQL space is Oracle NoSQL [http://www.oracle.com/technetwork/database/database-technologies/nosqldb/]. This system supports a key/value data model whereby the key is used to identify values that are not interpreted by the database itself. In addition, values can be of complex types that are actually interpreted by the database, e.g., in secondary indexes.

Aerospike is another example in the NoSQL space [http://www.aerospike.com/]. Aerospike provides a data model consisting of basic and complex types. In addition, it supports language-native formats as well as large data types that have a specific operational characteristics and data type operations tuned for scale.

Like some relational databases extended data models over time to support specific use cases in a more direct way, some NoSQL databases are also going down that path to more directly support specific application developer needs.

Multi-Interface Support

From an application development perspective a single query API is certainly preferable that provides the complete query expressiveness required. However, especially in the new area of NoSQL databases, it is not clear yet what a good query API actually looks like. This can be observed by different systems providing different query API alternatives.

MongoDB [http://www.mongodb.org/] has a document query interface based on query patterns in form of JSON documents (“Query Documents”). In addition, it provides a map/reduce interface and aggregation pipelines. There are three different APIs that an application developer can choose, and, in addition, they overlap in their functionality. This means that, depending on the query, it can be expressed in all three of them.

Aerospike [http://www.aerospike.com/] provides different language drivers in addition to an Aerospike Query Language.

Cloudant [http://www.cloudant.com], in contrast, supports a REST-api as well as a query interface based on query documents (similar to MongoDB).

Not strictly an external interface, but very important for specific use cases, is the ability to add functionality dynamically to the database in order to move some processing from the application systems into the database itself: user defined functions. For example, MongoDB allows adding functionality through JavaScript functions, whereas Aerospike supports two different types of Lua functions: record user defined functions operate on single records, whereby stream user defined functions support distributed processing.

The Good

Unquestionably, the good part about multi-interface and multi-data-model databases is that an application developer can choose the best combination of data model and access interface for a particular development task. The impedance mismatch between the problem and the solution can be minimized with an appropriate choice.

This also means that developers need to understand the pros and cons of every combination and that requires to go through a learning curve. Going through that learning curve might pay off big time.

In addition, application development teams will have to manage a wider range of implementation variations in terms of application design and engineering, but also in terms of bug fixing and application code maintenance.

The Tricky

The tricky part of multi-interface and multi-data-model databases is that all combinations can be used concurrently, in production as well as post-production (e.g. analytics). Unit and functional tests as well as performance and scale tests become a lot more complicated as they have to test the concurrent execution of various combinations.

Furthermore, many queries can be expressed in different interfaces as those tend to overlap in query expressiveness. So an application developer needs to clearly understand the pros and cons of the query execution that underlies a specific query interface.

Hopefully the query semantics is the same for all combinations (meaning, for example, that predicates are evaluated the same way) and that concurrent use of data models and interfaces does not negatively impact the various clients in terms of concurrency, scale and performance. Any bug introduced through a discrepancy might be very difficult to reproduce and fix.

Summary

While multi-interface and multi-data-model databases are a powerful technology, there is considerable impact to the application system engineering activities in terms of knowledge acquisition, development, test and maintenance.

While database vendors certainly strive to have all combinations work in harmony, there might be edge cases where one combination does not give the same result as a different one. From an application development perspective test coverage should ensure semantic equivalence of the used combinations so that misinterpretations or wrong results are avoided.

Document-oriented NoSQL Databases: Oracle NoSQL’s Shift in March 2014

I believe Oracle NoSQL extended and expanded from a key/value store to a document-oriented NoSQL database in March 2014. If you concur, then this was a major shift in the NoSQL arena of document-oriented databases.

Document-Oriented NoSQL Databases

In a narrow interpretation, document-oriented NoSQL databases use JSON or BSON as the query interface data representation. In a wider interpretation, any scalar data type and any composite data type (like maps or array) is available at the query interface without those necessarily complying to the JSON encoding or interpretation.

Some of the known document-oriented database are MongoDB, CouchDB and RethinkDB, amongst many others. The site http://db-engines.com/en/ranking/document+store has a categorization and ranking readily available.

In addition to storing documents (aka, complex data structures with scalar, array and map data types), another important criteria is the ability to query based on the document content by e.g. selecting documents where specific properties have specific values or comply to specific complex predicates (in the general case).

Key/Value Stores

Key/value stores (k/v stores) are different from document-oriented databases. Key/value stores persist a value for a given key. Retrieval is based on the key and values cannot be used in query predicates. Some k/v stores have structured keys or composite keys that support querying a range or set of values.

Oracle NoSQL, for example, is supporting structured and composite keys (http://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/singleget.html, http://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/multiget.html).

Aerospike is another example of a k/v store (https://docs.aerospike.com/display/V3/Key-Value+Store+Guide).

K/V Stores: Major Shift

Oracle NoSQL extended its data model from a pure k/v data model to a document-oriented data model in March 2014.

The available simple data types are (cited from http://docs.oracle.com/cd/NOSQL/html/GettingStartedGuideTables/tablesapi.html)

  • Double
  • Float
  • Integer
  • Long
  • Java byte array
  • Java String

And the available complex data types are (cited from http://docs.oracle.com/cd/NOSQL/html/GettingStartedGuideTables/tablesapi.html)

  • Array (An array of values, all of the same type)
  • Enum (An enumeration, represented as an array of strings)
  • Fixed Binary (A fixed-sized binary type used to handle binary data where each record is the same size. It uses less storage than an unrestricted binary field, which requires the length to be stored with the data)
  • Map (An unordered map type where all entries are constrained by a single type)
  • Records

In addition, indexes can be created on the ‘value’ part, aka, the documents: http://docs.oracle.com/cd/NOSQL/html/GettingStartedGuideTables/createindex.html. And based on indexes, the database can be queried without having to specify a key: http://docs.oracle.com/cd/NOSQL/html/GettingStartedGuideTables/indexread.html.

The latter functionality extends the characteristics of the Oracle NoSQL database from k/v store into a document-oriented NoSQL database. Clearly secondary indexes are not yet a very expressive query language, however, they are a major first step towards it.

What does it mean?

The interesting aspect is that Oracle as a major database vendor moved with a separate and dedicated NoSQL product first into the k/v store space, and now additionally into the document-oriented NoSQL space. There are many ways to read this move, of course, but from a technical viewpoint this emphasizes the importance of a structured data model in the NoSQL space. In addition, OracleNoSQL allows a per-document Avro schema, making it a truly per-document schema database.

Aerospike, for example, also supports complex data types and secondary indexes, even though it is also widely seen as a k/v store (e.g., http://db-engines.com/en/ranking/key-value+store).

Summary

Clearly, the document-oriented segment of NoSQL databases gets more and more crowded and the attention of major players in the database space.

What hasn’t been done is a comparison of the data model and query interface expressiveness of the major players in the document-oriented NoSQL database segment. Once done, and in case that reveals that the expressiveness is the same or almost the same, then the systems are exchangeable from a functionality viewpoint and will distinguish themselves ‘only’ in terms of non-functional properties like latency, throughput, transactions, cost, support, and so on.