SQL: The Kingmaker of NoSQL Databases?

Here is an interesting question: what if all NoSQL and NewSQL databases would implement SQL?

Advanced SQL

“SQL” in this context does not refer to SQL 92 (http://en.wikipedia.org/wiki/SQL-92) or other versions of the purely relational model focused standard, but to a SQL that is extended properly in syntax and in semantics to deal with non-relational data models. Let’s name it “Advanced SQL” for the purpose of this blog.

I furthermore assume that all data in a database would be accessible, not just parts of it. For example, in key/value stores the values’ internal structure would be accessible as long as it contains data accessible in a structured way (e.g., a JSON representation).

Ideally, of course, all NoSQL and NewSQL databases would implement the same new form of Advanced SQL, but for this blog this is not really necessary or a precondition (but it would be nice, though).

If such an Advanced SQL would be available on all NoSQL and NewSQL databases, then

  • the typical SQL operators like select, project, join, and so on are available on all data in a database across all data models like relational, document-oriented, key/value, columnar, graph-oriented.
  • the same data set loaded into each NoSQL or NewSQL database would be accessible through Advanced SQL and the same query would return the same result, no matter which database (best case assumption wrt. SQL execution semantics).

Data Models

In order to be able to load all data sets into all types of databases the data models have to be equivalent in their structural expressiveness. This might not necessarily be the case today, however, the common trend appears to be that databases are either supporting JSON, or structured data containing array types and nested sub-structures alongside elementary types like string and integer.

For the sake of the blog, let’s assume the data models are equivalent in structural expressiveness.

Implication of Advanced SQL on NoSQL / NewSQL Databases

The biggest implication of Advanced SQL on NoSQL / NewSQL databases would be that all NoSQL / NewSQL databases would be equivalent from a functional perspective in terms of accessing data: a given data set could be stored in any of the databases and queried using SQL. A query would (ideally) return the same result in any of the databases storing the same data set.

The big question this bears is: what would distinguish the NoSQL/NewSQL databases in this case?

  • The non-functional aspects.

For example, to name a few:

  • Performance: the same query would take different execution time
  • Transactions: a set of queries would form a single transaction (or not)
  • Scale: the speed and extent a database can scale up/scale down with varying data set size
  • Backup/restore: how fast and how reliable can databases be re-created after disaster

This basically would mean that a user can select a database based on its non-functional characteristics. (Of course, history has shown that the particular SQL implementation of a database will have proprietary extensions also).

Benefits of Advanced SQL

There are several benefits if Advanced SQL like described above would be available.

  • A user can select not only one, but several databases for different needs, while not having to deal with vastly different query interfaces. For example, one for production system access speed and a different one for long-term storage or analytics.
  • Data can be distributed across different systems depending of how the data is used without having to deal with incompatible access interfaces.
  • NoSQL and NewSQL databases can be integrated alongside existing RDBMSs for mutual benefit.

In a nutshell, users are not forced anymore to select one databases for all their needs, but can select the appropriate one for the particular part of their system architecture.

Trending of Non-functional Features

There is an interesting trend ongoing. OldSQL and NewSQL databases start supporting JSON (e.g., Postgres, VoltDB); Informix has a compatible driver interface to the MongoDB wire format; Tokutek and GridGain implement non-functional replacements for MongoDB; Oracle NoSQL supports per-document Avro schemas for the values; Tokutek adds transaction support; and so on. This looks like the early stages of ‘sorting out’ on a data model level as well as non-functional level. Time will tell if this was the starting point of consolidation in the NoSQL/NewSQL space.

King Making

So why, if the differentiation is likely to happen on the non-functional level, is SQL a potential kingmaker? In my opinion, Advanced SQL will be the standard expectation and a database not supporting Advanced SQL will be relegated to being a niche system. So while the presence of Advanced SQL will be a distinguishing factor (possibly a small one), the absence of Advanced SQL will impact the success of a database significantly.

Advertisements