SQL: The Kingmaker of NoSQL Databases?

Here is an interesting question: what if all NoSQL and NewSQL databases would implement SQL?

Advanced SQL

“SQL” in this context does not refer to SQL 92 (http://en.wikipedia.org/wiki/SQL-92) or other versions of the purely relational model focused standard, but to a SQL that is extended properly in syntax and in semantics to deal with non-relational data models. Let’s name it “Advanced SQL” for the purpose of this blog.

I furthermore assume that all data in a database would be accessible, not just parts of it. For example, in key/value stores the values’ internal structure would be accessible as long as it contains data accessible in a structured way (e.g., a JSON representation).

Ideally, of course, all NoSQL and NewSQL databases would implement the same new form of Advanced SQL, but for this blog this is not really necessary or a precondition (but it would be nice, though).

If such an Advanced SQL would be available on all NoSQL and NewSQL databases, then

  • the typical SQL operators like select, project, join, and so on are available on all data in a database across all data models like relational, document-oriented, key/value, columnar, graph-oriented.
  • the same data set loaded into each NoSQL or NewSQL database would be accessible through Advanced SQL and the same query would return the same result, no matter which database (best case assumption wrt. SQL execution semantics).

Data Models

In order to be able to load all data sets into all types of databases the data models have to be equivalent in their structural expressiveness. This might not necessarily be the case today, however, the common trend appears to be that databases are either supporting JSON, or structured data containing array types and nested sub-structures alongside elementary types like string and integer.

For the sake of the blog, let’s assume the data models are equivalent in structural expressiveness.

Implication of Advanced SQL on NoSQL / NewSQL Databases

The biggest implication of Advanced SQL on NoSQL / NewSQL databases would be that all NoSQL / NewSQL databases would be equivalent from a functional perspective in terms of accessing data: a given data set could be stored in any of the databases and queried using SQL. A query would (ideally) return the same result in any of the databases storing the same data set.

The big question this bears is: what would distinguish the NoSQL/NewSQL databases in this case?

  • The non-functional aspects.

For example, to name a few:

  • Performance: the same query would take different execution time
  • Transactions: a set of queries would form a single transaction (or not)
  • Scale: the speed and extent a database can scale up/scale down with varying data set size
  • Backup/restore: how fast and how reliable can databases be re-created after disaster

This basically would mean that a user can select a database based on its non-functional characteristics. (Of course, history has shown that the particular SQL implementation of a database will have proprietary extensions also).

Benefits of Advanced SQL

There are several benefits if Advanced SQL like described above would be available.

  • A user can select not only one, but several databases for different needs, while not having to deal with vastly different query interfaces. For example, one for production system access speed and a different one for long-term storage or analytics.
  • Data can be distributed across different systems depending of how the data is used without having to deal with incompatible access interfaces.
  • NoSQL and NewSQL databases can be integrated alongside existing RDBMSs for mutual benefit.

In a nutshell, users are not forced anymore to select one databases for all their needs, but can select the appropriate one for the particular part of their system architecture.

Trending of Non-functional Features

There is an interesting trend ongoing. OldSQL and NewSQL databases start supporting JSON (e.g., Postgres, VoltDB); Informix has a compatible driver interface to the MongoDB wire format; Tokutek and GridGain implement non-functional replacements for MongoDB; Oracle NoSQL supports per-document Avro schemas for the values; Tokutek adds transaction support; and so on. This looks like the early stages of ‘sorting out’ on a data model level as well as non-functional level. Time will tell if this was the starting point of consolidation in the NoSQL/NewSQL space.

King Making

So why, if the differentiation is likely to happen on the non-functional level, is SQL a potential kingmaker? In my opinion, Advanced SQL will be the standard expectation and a database not supporting Advanced SQL will be relegated to being a niche system. So while the presence of Advanced SQL will be a distinguishing factor (possibly a small one), the absence of Advanced SQL will impact the success of a database significantly.

Document Projection (Part 2): Definition

What does projection in context of JSON structures or documents actually mean? What should the definition of “projection” be? There are several possibilities discussed next.

Document Projection: Complete Branch

Projection in JSON is projecting a branch of the JSON data structure tree, not projecting a single object or value. To request a projection, a property (projection) path in dot notation could be used (and actually is in many systems). The result of a projection is a valid JSON document containing the specified branch(es).

An example document is

{"a": {"b": {"c": 3, "d": 4, "e": 5}}}

Let’s go through a series of projections in the following.

  • Projection path: “a.b.c”
  • Result: {“a”: {“b”: {“c”: 3}}}
  • Projection path: “a.b”
  • Result: {“a”: {“b”: {“c”: 3, “d”: 4, “e”: 5}}}
  • Projection path: “a.e”
  • Result: {}

The result contains the full path of the projection (or more, but not less). If the requested projection path does not exist, the result is the empty document as none of its properties matches the projection path. The empty projection path “” is not defined, meaning, a projection must name at least one property, and that will be a top-level property in a document.

Several branches can be projected concurrently.

  • Projection paths: “a.b.c”, “a.b.d”
  • Result: {“a”: {“b”: {“c”: 3, “d”: 4}}}

The resulting document contains the combination of all of the branches that result in a valid projection. Redundant projection path specification is possible if one of the projection paths is a sub-path of another one. However, the result document is the same if redundancy is present or absent.

Document Projection: Partial Branch

It might be possible that the whole projection path does not exist, but a part of it. In this case it is a possibility to add the existing result up to that point (MongoDB follows this approach). This results in partial paths whereby the value of their last property is the empty document.

For example, “a.b.f” would result in {“a”: {“b”: {}}}. “a” and “b” exist in the example document, “f”, however, does not.

In my opinion, while possibly useful in some cases, I would not make this the default or standard definition as a result is returned that is incomplete and I could argue that it is in fact incorrect since the value of “b” is not the empty document (I could envision a configuration setting that provides these partial branches if needed).

Document Projection: Value

Wait a minute, why does the result document have to consist of full paths?

The reason is based on the implicit restriction on JSON documents that there can be only one property of a given name on the same level in a document. “Implicit” because the JSON definition (http://json.org/) does not mandate the restriction, but many implementations do: property names on the same level of embedding have to be unique.

For example:

{"x": {"b": {"c": 3, "d": 4}}, 
 "y": {"e": {"c": 3, "d": 4}}}

is a perfectly valid document where the property names are unique on every level. So let’s get back to projection and let’s for a moment assume that projection actually returns the value at the end of the path, omitting the path to the property value itself. So,

  • Projection path: “x.b.c”
  • Result: {“c”: 3}

So far so good.

  • Projection paths: “x.b.c”, “y.e.c”
  • Result: ?

What should the result be? The basic assumption is that a projection on a document returns a document. But “x.b.c” and “y.e.c” both return {“c”: 3} as separate documents, but not one document.

  • One possible result could be an array with two documents. But arrays are in general not considered valid top level documents (again, the JSON definition would allow that).
  • To mitigate that, the array could be the value of a property: {“result”: [{"c": 3}, {"c": 3}]}. But this would conflict with a document that happens to have a property “result” of type array with two same documents in it.
  • Or, the two documents would have to be embedded in a third one with special names, like {“1″: {“c”: 3}, “2″: {“c”: 3}}. But then, the original document does not have the properties “1″ or “2″.

Based on this discussion having projection results being full paths is simpler and direct.

Projection – Result Correspondence Principle

There is also another argument from the user’s viewpoint. If a user wants to project “x.b.c”, then the user might want to access the result document after the query returns with “x.b.c” as the access path. From this viewpoint, the path in the query and the path in the result document should match and not require access path transformation.

Array Projection: Complete Access Path

Documents may contain arrays as well as array of arrays, arrays of objects of arrays, etc., in principle any of these composite types can be on any level of the document. Projection therefore has to be defined on arrays also, not just documents.

The principle of project paths is extended to include array index specification. For example, let’s consider this document:

{"a": [{"a1": 1}, {"a2": 2}], 
 "b": {"c": [{"c1": 3}, {"c2": 4}, {"c3": 5}]}, 
 "d": [6, 7]}

Let’s do a few projections (arrays are 0-index based):

  • Projection path: a[0]
  • Result: {“a”: [{"a1": 1}]}
  • Projection path: b.c[1]
  • Result: {“b”: {“c”: ["c2": 4]}}
  • Projection paths: a[1], b.c[2].c3
  • Result: {“a”: [{"a2": 2}], “b”: {“c”: [{"c3": 5}]}}
  • Projection path: a[7]
  • Result: {}

Like in the case of documents, full paths are requested and full paths are returned, with several paths possible. A projection path referring to a non-existing property will not contribute to the result.

So far, so good, except that the results do not yet conform to the “Projection – Result Correspondence” principle from above: the projection “a[1]” resulted in a correct document, but that result document cannot be accessed with “a[1]” to obtain the value.

Array Projection: Padding

In order to support the “Projection – Result Correspondence” principle array results can be padded with the value “null”. For example:

  • Projection paths: a[1], b.c[2].c3
  • Result: {“a”: [null, {"a2": 2}], “b”: {“c”: [null, null, {"c3": 5}]}}

Now it is possible to access the result with “a[1]” or “b.c[2].c3″ in order to obtain the proper results. From a user’s perspective this is great as again the paths used to specify the projection can be used to retrieve the values.

Array Projection: Scalar Values

Scalar values in arrays do not pose a specific problem:

  • Projection paths: a[1], d[1], d[2]
  • Result: {“a”: [null, {"a2": 2}], “d”: [null, 7]}

And their access can be accomplished using the projection paths.

Summary

Initially I thought projection is a straight forward function and not worth a discussion in context of document-oriented databases; but then it turned out to be not that clear cut. Nevertheless, the above is a starting point for a strict rationalization of projection in document-oriented databases based on the JSON data model.

Document Projection (Part 1): MongoDB

This blog reviews some of the projection functionality that the MongoDB query interface provides. The emphasis is comparing projection of embedded object properties with projection of embedded array elements. Those are not symmetric, as the examples will show, and that is surprising and rather unexpected.

Test Data Set

The initial test data set contains three documents:

{}
{"a": 1, "b": 2}
{"c": {"d": 3, "e": 4, "f": {"g": 5, "h": 6}}}

The empty document is the control document, and two documents have properties whereby one of those has several levels of embedding. The test data set, contained in a file “td.txt”, is loaded into MongoDB as follows:

mongoimport -d projection -c proj td.txt

Projecting Document Properties

Let’s observe a few projection queries:

> db.proj.find({}, {_id: 0, a: 1})
{  }
{ "a" : 1 }
{  }

In general, to make the result less verbose, the property “_id” is suppressed. The query asked for property “a” and for each document the property “a” is returned (whereby two documents do not contain “a”, so the resulting documents are empty).

> db.proj.find({}, {_id: 0, "c.e": 1})
{  }
{  }
{ "c" : { "e" : 4 } }

Reaching into documents is done by using the dot notation. “c.e” asks for the property “e” in “c”. The result contains the whole document structure starting at “c” all the way to “e”. Alternatively MongoDB could have returned {“e”: 4} only, but then the result would not correspond to the dot notation in the query.

> db.proj.find({}, {_id: 0, "c.f": 1})
{  }
{  }
{ "c" : { "f" : { "g" : 5, "h" : 6 } } }

No surprise here as the result contains the property “c.f”.

> db.proj.find({}, {_id: 0, "c.f.h": 1})
{  }
{  }
{ "c" : { "f" : { "h" : 6 } } }

Reaching further into the document works as expected also.

> db.proj.find({}, {_id: 0, a: 1, c: 1})
{  }
{ "a" : 1 }
{ "c" : { "d" : 3, "e" : 4, "f" : { "g" : 5, "h" : 6 } } }

Asking for different properties returns all of those for each document.

> db.proj.find({}, {_id:0, "c.e":1, "c.f.h":1})
{  }
{  }
{ "c" : { "e" : 4, "f" : { "h" : 6 } } }

Asking for different properties in the same embedded document returns a combination of the properties, not separate properties. This makes sense as in each document a property with a given name can only appear once.

> db.proj.find({}, {_id: 0, "c.g": 1})
{  }
{  }
{ "c" : {  } }

This query asks for an embedded property that does not exist. However, the properties on the path to that missing property are actually included in the result. This is surprising to me as I would expect that if a property does not exist, no property is included in the result at all, especially not a partial property, so to say. In this case, the path in the query “c.g” does not match any path in the result.

Projecting Array Element Properties

First of all, another document is added to the test data set that contains arrays and nested document that in turn contain an array:

{"x": [{"y": 10}, {"w": [{"z": 11}, {"v": 12}]}]}

So a total of 4 documents are in the test data set now.

Projection of arrays is done using the “$slice” operator described here: http://docs.mongodb.org/manual/reference/operator/projection/slice/ (in addition to predicate-driven selection/projection using “$” or “$elemMatch”). Let’s try.

> db.proj.find({}, {_id: 0, "x": {$slice: 1}})
{  }
{ "a" : 1, "b" : 2 }
{ "c" : { "d" : 3, "e" : 4, "f" : { "g" : 5, "h" : 6 } } }
{ "x" : [ { "y" : 10 } ] }

This asks for one element from the array “x”, and indeed, the first element is returned. However, all properties of every document not containing “x” are returned also. This is surprising and in contrast to the behavior when projecting properties of embedded documents.

> db.proj.find({x: {$exists: true}}, {_id: 0, "x": {$slice: 1}})
{ "x" : [ { "y" : 10 } ] }

The “$exists” operator limits the documents to those that contain “x” only. This is still not the same semantics as in the document property projection, but closer.

> db.proj.find({x: {$exists: true}}, {_id: 0, "x": {$slice: [1, 1]}})
{ "x" : [ { "w" : [ { "z" : 11 }, { "v" : 12 } ] } ] }

This selects the second array element (the first “1″ indicates the number of skips, and the second “1″ indicates how many elements should be selected.

This is different from selecting the second property, in my opinion, as in case of projecting the second property it would be important to see in the result that the second property was projected. This definitely debatable, by in analogy to projecting embedded document properties, the result would have to reflect the query.

Let’s try to select the second element of “w”. This requires reaching into the array on the first level.

> db.proj.find({x: {$exists: true}}, {_id: 0, "x.1.w": {$slice: [1,1]}})
{ "x" : [ { "y" : 10 }, { "w" : [ { "z" : 11 }, { "v" : 12 } ] } ] }

The approach using the dot notation fails. The query does not honor “x.1.w”, specifying: project from “w”, which is the second array element “x.1″. However, the interface is not giving an error, either, as it probably should?

> db.proj.find({x: {$exists: true}}, {_id: 0, "x.w": {$slice: [1,1]}})
{ "x" : [ { "y" : 10 }, { "w" : [ { "v" : 12 } ] } ] }

This works. MongoDB seem to automatically interpret this correctly. However, the first array element of “x” is returned also, again in contrast to the approach for document properties where properties that are not on they path will not be returned.

But what if “x” would contain an additional array element with property “w”? Let’s add this document:

{"x": [{"y": 10}, {"w": [{"z": 11}, {"v": 12}]}, {"w": [{"z": 13}, {"v": 14}]}]}

Now 5 documents are in the test data set.

> db.proj.find({x:{$exists: true}}, {_id: 0, "x.w": {$slice: [1,1]}})
{ "x" : [ { "y" : 10 }, { "w" : [ { "v" : 12 } ] } ] }
{ "x" : [ { "y" : 10 }, { "w" : [ { "v" : 12 } ] }, { "w" : [ { "v" : 14 } ] } ] }

Turns out, MongoDB selects all documents that contain a property “w”, and from those the second element. This looks reasonable.

But, how is the second element of the first “w” selected? I think multi-level projection in context of arrays is not possible at this point: https://jira.mongodb.org/browse/SERVER-831.

> db.proj.find({x:{$exists: true}}, {_id: 0, "x.q": {$slice: [1,1]}})
{ "x" : [ { "y" : 10 }, { "w" : [ { "z" : 11 }, { "v" : 12 } ] } ] }
{ "x" : [ { "y" : 10 }, { "w" : [ { "z" : 11 }, { "v" : 12 } ] }, { "w" : [ { "z" : 13}, { "v" : 14 } ] } ] }
{ "x" : [ { "y" : 10 }, { "w" : [ { "z" : 11 }, { "v" : 12 } ] }, { "a" : 20 } ] }

Asking for an non-existing property returns the complete array, again, in contrast to the analogous query in embedded documents.

I’ll stop here at examining projecting array elements as clearly there is limited support for it as this point in time.

Summary

Surprisingly, at least for me, MongoDB does not follow the same design rules when projecting properties from embedded documents compared with projecting elements from embedded arrays. Not only is the behavior different

  1. Properties of documents not containing the requested array projection are returned
  2. Properties not on the path to the projected element are returned

but also multi-level projection does not work in a straight forward way: it is not possible to use dot notation to reach into nested arrays for projection.

Querying for non-existing properties results in partial or incorrect results, in my opinion. Of course, there are different viewpoints possible on this behavior, and for sure it warrants further discussion.

Schema-free Database (Part 2): Relational Database Management System (RDBMS)

As outlined in Part 1 of this series (http://realprogrammer.wordpress.com/2013/11/02/schema-free-database-part-1-an-oxymoron/), a ‘schema-free database’ is an oxymoron and in fact the notion of schema is changing from a more restrictive to a more flexible interpretation in context of NoSQL database technology.

So it is only consequential to ask the question the other way around (as a thought experiment): is it possible to build a relational database management system that does not enforce a schema, and if so, how would such a system look like on an abstract level?

Yes, it is possible to have a non-schema-enforcing RDBMS. Let’s discuss two variations next.

Definition of No-Schema-Enforcing Relational Database Management System

What functionality would be altered in order to provide a no-schema-enforcing RDBMS? If it were possible to create a table without specifying columns (aka, only a table name), and then to insert, update and delete rows, then a ‘schema-free’ RDBMS would be in place. This would mean in detail:

  • Rows do not have to comply to a schema when inserted into a table. Different rows in the same table could have different attributes (columns) and the same attributes of different rows could have different domains (flexible type system).
  • By defining a table without specifying columns (names and domains), a table would not define a primary key, either (neither a simple, nor a composite key). Applications inserting or updating rows can behave nicely and add properties with values that comply to the primary key semantics, but the RDBMS would not be aware of it and consequently would not enforce primary key compliance.
  • By the same token, foreign keys would not be enforced by the RDBMS for the same reasons.
  • Since no primary key enforcement is in place, duplicate rows will not be prevented by the RDBMS and any supervision is left to the application systems.
  • Indexes are independent of schema specification and assuming that indexes are maintained on tables, not all rows might be present in an index if the attributes defined by the index are not contained in a row.

Surprisingly (or not), defining a no-schema-enforcing RDBMS is pretty straight forward.

Variation on No-Schema-Enforcing RDBMS

An interesting variation of a no-schema-enforcing RDBMS could be that a schema, primary keys, foreign keys, etc., are specified as usual, however, without being actively enforced; instead, warnings are given by the RDBMS. For example, a row not complying to the schema can actually be inserted, but the result would not be a ‘OK’, but a warning indicating a schema violation.

This can be described as a ‘middle ground’ in widening the schema interpretation where the RDBMS is aware of a schema and warns of violations without rejecting the various DML operations.

Characterization of No-Schema-Enforcing RDBMS

Could a no-schema-enforcing RDBMS (any of the variations) be a useful database management system? Yes, as it would be the equivalent (on the relational model) to NoSQL databases (on JSON/BSON model or key value model).

For use cases where the flexible schema interpretation is key, such a no-schema-enforcing RDBMS could fit the bill (possibly better) than a NoSQL database system if the use case is fundamentally relational in nature (as opposed to e.g. hierarchical or key/value) and if SQL as the query language is important.

Further Exploration

There are additional areas in a RDBMS that will have to change their behavior in a no-schema-enforcing implementation. Only briefly (and not exhaustively), these are

  • Triggers. Triggers are specified on tables and state changes of rows. If particular attributes are referenced inside the trigger, then not every update, insert, read or delete will execute the trigger logic.
  • Stored procedures. Stored procedures often have parameters of specific types and assume a specific set of attributes when processing rows. In a no-schema-enforcing situation the stored procedure has to be able to deal with variations of rows.
  • Functions and function extensions. Functions have to be changed similarly to stored procedures. Not only from the viewpoint of parameters, but also the processing logic.
  • Aggregation. Aggregation will have to change in various ways as the various aggregation functions cannot assume that all attributes are of the same type. Neither can they assume that all attributes are actually present in all rows of a table.

In principle, every concept and every implementation aspect of a RDBMS needs to be re-examined wrt. a wider and more flexible interpretation of ‘schema’. NoSQL systems, by their definition and approach, started with a wider interpretation and consequently made all the conceptual and implementation decisions. They are one source of approach in this regard.

Contact Me

If you plan to explore or to build a no-schema-enforcing RDBMS, please contact me.

Schema-free Database (Part 1): An Oxymoron

The notion of a ‘schema-free database’ keeps coming up, most recently in a meetup I attended a few days ago. Some rationalization follows divided up into the categories of ‘document’ and ‘database’.

While a generalization is easily possible, the context here will be JSON and MongoDB as these are two practical implementations that are available and often used as examples of a ‘schema-free database’. Those provide a nice constraint technology set as an example, while the principles apply to a whole range of other technology, of course.

Document

A JSON document, in short: document, follows a set of construction principles outlined here: http://www.json.org/. This is a rather informal grammar that defines how a valid JSON document is constructed. There are no data type generators and so new data types cannot be introduced; therefore, every document is constructed from the fixed set of types enumerated on that web page.

Document Schema

An attempt has been made to create a more formal mechanism to define a schema for JSON documents: http://json-schema.org/. This approach provides a formal language to describe the schema of a JSON document explicitly.

The json-schema approach combined with the fixed set of types available to create a JSON document means that every JSON document can be described explicitly using json-schema without exceptions. This in turns means that every JSON document has at least an implicit schema, unless it is additionally made explicit with e.g. json-schema.

Therefore, JSON documents have a schema, an implicit one and optionally an explicit one. Depending on the particular schema definition language approach itself, a document might match more than one schema, but that is left for a separate discussion.

Set of Document Schemas

Given a set of JSON documents it is now possible to characterize their relationship to schemas. In the ‘best’ case, all documents follow the same schema; in the ‘worst’ case, each document follows its own schema. And there are cases in-between where a subset of the documents validates against a schema, and another subset against another schema. Depending on the design, one JSON document might validate against different schemas.

The relationship between documents and schemas is n:m in general.

Database

In context of a database, there are a few interesting questions in this context:

  • Does the database understand a document representation (e.g. JSON)?
  • Does the database enforce a document representation?

And:

  • Does the database enforce a schema?
  • Does the database understand a schema?

Let’s answer these questions for MongoDB specifically:

And:

  • MongoDB enforces a partial schema. Each document must have a property called “_id”; if the document being inserted does not have such a property, one is automatically added.
  • MongoDB does not understand an explicit schema as it does not provide for a mechanism to load a schema definition language.

MongoDB, however, understands implicit schemas as MongoDB does allow to e.g. create an index on any property of documents. So MongoDB recognizes properties.

Furthermore, MongoDB supports aggregation functions and supports e.g. the sum of properties across documents (http://realprogrammer.wordpress.com/2012/11/04/null-undefined-nan-and-missing-property-goto-considered-harmful-part-2/). So it is data type aware and implements operators (e.g. sum) on those.

Conclusion: The Notion of ‘Schema’ is Changing

This rather brief discussion clearly rationalizes that the label ‘schema-free database’ is not applicable to technologies such as those discussed in this blog (JSON/BSON, MongoDB).

Hence these technologies are not an example of ‘schema-free database’, to the contrary: they demonstrate that the notion of ‘schema’ can have a wider and more flexible interpretation then what relational databases bring forward.

MongoDB and node.js (Part 4): Query Results

Now that the data is stored in MongoDB (http://realprogrammer.wordpress.com/2013/03/27/mongodb-and-node-js-part-2-insertion-log-examination/), let’s get it out again through queries and examine the queries’ results.

Part 3 of this series (http://realprogrammer.wordpress.com/2013/05/29/mongodb-and-node-js-part-3-looking-at-query-results-its-complicated/) introduced the reason why the results are printed with and without using JSON.stringify() and outlined why in context of MongoDB and node.js using JSON.stringify() might be not such a good idea.

There are as many queries generated as there are BSON types and the the BSON types are queried in ascending order. The printout after “===>” is generated by “console.log(<query_result_document>)” and the printout after “—>” is generated by “console.log(JSON.stringify(<query_result_document>))”.

Some interpretation will follow in Part 5 of this series. Here are the results uninterpreted for now:

===> { x: 123.123,
  comment: 'new MongoDB.Double(123.123)',
  btype: 1,
  _id: 5269796fab3df2f90b000001 }
---> {"x":123.123,"comment":"new MongoDB.Double(123.123)","btype":1,"_id":"5269796fab3df2f90b000001"}

===> { x: 456.456,
  comment: '456.456',
  btype: 1,
  _id: 5269796fab3df2f90b000002 }
---> {"x":456.456,"comment":"456.456","btype":1,"_id":"5269796fab3df2f90b000002"}

===> { x: NaN,
  comment: 'Number.NaN',
  btype: 1,
  _id: 5269796fab3df2f90b00001c }
---> {"x":null,"comment":"Number.NaN","btype":1,"_id":"5269796fab3df2f90b00001c"}

===> { x: Infinity,
  comment: 'Infinity',
  btype: 1,
  _id: 5269796fab3df2f90b00001d }
---> {"x":null,"comment":"Infinity","btype":1,"_id":"5269796fab3df2f90b00001d"}

===> { x: Infinity,
  comment: 'Number.POSITIVE_INFINITY',
  btype: 1,
  _id: 5269796fab3df2f90b00001e }
---> {"x":null,"comment":"Number.POSITIVE_INFINITY","btype":1,"_id":"5269796fab3df2f90b00001e"}

===> { x: -Infinity,
  comment: 'Number.NEGATIVE_INFINITY',
  btype: 1,
  _id: 5269796fab3df2f90b00001f }
---> {"x":null,"comment":"Number.NEGATIVE_INFINITY","btype":1,"_id":"5269796fab3df2f90b00001f"}

===> { x: 5e-324,
  comment: 'MIN_VALUE',
  btype: 1,
  _id: 5269796fab3df2f90b000020 }
---> {"x":5e-324,"comment":"MIN_VALUE","btype":1,"_id":"5269796fab3df2f90b000020"}

===> { x: 1.7976931348623157e+308,
  comment: 'MAX_VALUE',
  btype: 1,
  _id: 5269796fab3df2f90b000021 }
---> {"x":1.7976931348623157e+308,"comment":"MAX_VALUE","btype":1,"_id":"5269796fab3df2f90b000021"}

===> { x: 'abc',
  comment: 'abc',
  btype: 2,
  _id: 5269796fab3df2f90b000003 }
---> {"x":"abc","comment":"abc","btype":2,"_id":"5269796fab3df2f90b000003"}

===> { x: { z: 5 },
  comment: '{"z": 5}',
  btype: 3,
  _id: 5269796fab3df2f90b000004 }
---> {"x":{"z":5},"comment":"{\"z\": 5}","btype":3,"_id":"5269796fab3df2f90b000004"}

===> { x: [ { y: 4 }, { z: 5 } ],
  comment: '[{"y": 4}, {"z": 5}]',
  btype: 3,
  _id: 5269796fab3df2f90b000006 }
---> {"x":[{"y":4},{"z":5}],"comment":"[{\"y\": 4}, {\"z\": 5}]","btype":3,"_id":"5269796fab3df2f90b000006"}

===> { x: 
   { _bsontype: 'DBRef',
     namespace: 'types_node',
     oid: '5040dc5d40b67c681d000001',
     db: 'types' },
  comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types")',
  btype: 3,
  _id: 5269796fab3df2f90b00000f }
---> {"x":{"$ref":"types_node","$id":"5040dc5d40b67c681d000001","$db":"types"},"comment":"new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\", \"types\")","btype":3,"_id":"5269796fab3df2f90b00000f"}

===> { x: 
   { _bsontype: 'DBRef',
     namespace: 'types_node',
     oid: '5040dc5d40b67c681d000001',
     db: undefined },
  comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001")',
  btype: 3,
  _id: 5269796fab3df2f90b000010 }
---> {"x":{"$ref":"types_node","$id":"5040dc5d40b67c681d000001","$db":""},"comment":"new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\")","btype":3,"_id":"5269796fab3df2f90b000010"}

===> { x: 
   { _bsontype: 'DBRef',
     namespace: 'types_node',
     oid: '5040dc5d40b67c681d000001',
     db: 'types' },
  comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"}',
  btype: 3,
  _id: 5269796fab3df2f90b000011 }
---> {"x":{"$ref":"types_node","$id":"5040dc5d40b67c681d000001","$db":"types"},"comment":"{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\", \"$db\": \"types\"}","btype":3,"_id":"5269796fab3df2f90b000011"}

===> { x: 
   { _bsontype: 'DBRef',
     namespace: 'types_node',
     oid: '5040dc5d40b67c681d000001',
     db: undefined },
  comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"}',
  btype: 3,
  _id: 5269796fab3df2f90b000012 }
---> {"x":{"$ref":"types_node","$id":"5040dc5d40b67c681d000001","$db":""},"comment":"{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\"}","btype":3,"_id":"5269796fab3df2f90b000012"}

===> { x: 
   { _bsontype: 'Binary',
     sub_type: 0,
     position: 6,
     buffer: <Buffer 62 69 6e 61 72 79> },
  comment: 'new MongoDB.Binary("binary")',
  btype: 5,
  _id: 5269796fab3df2f90b000007 }
---> {"x":"YmluYXJ5","comment":"new MongoDB.Binary(\"binary\")","btype":5,"_id":"5269796fab3df2f90b000007"}

===> { x: 5040dc5d40b67c681d000001,
  comment: 'new MongoDB.ObjectID("5040dc5d40b67c681d000001")',
  btype: 7,
  _id: 5269796fab3df2f90b000008 }
---> {"x":"5040dc5d40b67c681d000001","comment":"new MongoDB.ObjectID(\"5040dc5d40b67c681d000001\")","btype":7,"_id":"5269796fab3df2f90b000008"}

===> { x: false,
  comment: 'false',
  btype: 8,
  _id: 5269796fab3df2f90b000009 }
---> {"x":false,"comment":"false","btype":8,"_id":"5269796fab3df2f90b000009"}

===> { x: true,
  comment: 'true',
  btype: 8,
  _id: 5269796fab3df2f90b00000a }
---> {"x":true,"comment":"true","btype":8,"_id":"5269796fab3df2f90b00000a"}

===> { x: Fri Aug 31 2012 05:13:14 GMT-0700 (PDT),
  comment: 'new Date("2012-08-31 12:13:14:156 UTC")',
  btype: 9,
  _id: 5269796fab3df2f90b00000b }
---> {"x":"2012-08-31T12:13:14.156Z","comment":"new Date(\"2012-08-31 12:13:14:156 UTC\")","btype":9,"_id":"5269796fab3df2f90b00000b"}

===> { x: null,
  comment: 'null',
  btype: 10,
  _id: 5269796fab3df2f90b00000c }
---> {"x":null,"comment":"null","btype":10,"_id":"5269796fab3df2f90b00000c"}

===> { x: null,
  comment: 'undefined',
  btype: 10,
  _id: 5269796fab3df2f90b00001b }
---> {"x":null,"comment":"undefined","btype":10,"_id":"5269796fab3df2f90b00001b"}

===> { x: /abc/,
  comment: 'new RegExp("abc")',
  btype: 11,
  _id: 5269796fab3df2f90b00000d }
---> {"x":{},"comment":"new RegExp(\"abc\")","btype":11,"_id":"5269796fab3df2f90b00000d"}

===> { x: /abc/i,
  comment: 'new RegExp("abc", "i")',
  btype: 11,
  _id: 5269796fab3df2f90b00000e }
---> {"x":{},"comment":"new RegExp(\"abc\", \"i\")","btype":11,"_id":"5269796fab3df2f90b00000e"}

===> { x: { _bsontype: 'Code', code: 'function () {}', scope: {} },
  comment: 'new MongoDB.Code("function () {}")',
  btype: 13,
  _id: 5269796fab3df2f90b000013 }
---> {"x":{"scope":{},"code":"function () {}"},"comment":"new MongoDB.Code(\"function () {}\")","btype":13,"_id":"5269796fab3df2f90b000013"}

===> { x: def15,
  comment: 'new MongoDB.Symbol("def15")',
  btype: 14,
  _id: 5269796fab3df2f90b000014 }
---> {"x":"def15","comment":"new MongoDB.Symbol(\"def15\")","btype":14,"_id":"5269796fab3df2f90b000014"}

===> { x: { _bsontype: 'Code', code: 'function (a) {}', scope: { a: 4 } },
  comment: ' new MongoDB.Code("function (a) {}", {"a": 4})',
  btype: 15,
  _id: 5269796fab3df2f90b000015 }
---> {"x":{"scope":{"a":4},"code":"function (a) {}"},"comment":" new MongoDB.Code(\"function (a) {}\", {\"a\": 4})","btype":15,"_id":"5269796fab3df2f90b000015"}

===> { x: [ 9, 8, 7 ],
  comment: '[9, 8, 7]',
  btype: 16,
  _id: 5269796fab3df2f90b000005 }
---> {"x":[9,8,7],"comment":"[9, 8, 7]","btype":16,"_id":"5269796fab3df2f90b000005"}

===> { x: 123456,
  comment: '123456',
  btype: 16,
  _id: 5269796fab3df2f90b000016 }
---> {"x":123456,"comment":"123456","btype":16,"_id":"5269796fab3df2f90b000016"}

===> { x: { _bsontype: 'Timestamp', low_: 1, high_: 2 },
  comment: 'new MongoDB.Timestamp(1, 2)',
  btype: 17,
  _id: 5269796fab3df2f90b000017 }
---> {"x":"8589934593","comment":"new MongoDB.Timestamp(1, 2)","btype":17,"_id":"5269796fab3df2f90b000017"}

===> { x: 987,
  comment: 'new MongoDB.Long("987")',
  btype: 18,
  _id: 5269796fab3df2f90b000018 }
---> {"x":987,"comment":"new MongoDB.Long(\"987\")","btype":18,"_id":"5269796fab3df2f90b000018"}

===> { x: { _bsontype: 'MaxKey' },
  comment: 'new MongoDB.MaxKey()',
  btype: 127,
  _id: 5269796fab3df2f90b00001a }
---> {"x":{"_bsontype":"MaxKey"},"comment":"new MongoDB.MaxKey()","btype":127,"_id":"5269796fab3df2f90b00001a"}

===> { x: { _bsontype: 'MaxKey' },
  comment: 'new MongoDB.MaxKey()',
  btype: 127,
  _id: 5269796fab3df2f90b00001a }
---> {"x":{"_bsontype":"MaxKey"},"comment":"new MongoDB.MaxKey()","btype":127,"_id":"5269796fab3df2f90b00001a"}

===> { x: { _bsontype: 'MinKey' },
  comment: 'new MongoDB.MinKey()',
  btype: 255,
  _id: 5269796fab3df2f90b000019 }
---> {"x":{"_bsontype":"MinKey"},"comment":"new MongoDB.MinKey()","btype":255,"_id":"5269796fab3df2f90b000019"}

===> { x: { _bsontype: 'MinKey' },
  comment: 'new MongoDB.MinKey()',
  btype: 255,
  _id: 5269796fab3df2f90b000019 }
---> {"x":{"_bsontype":"MinKey"},"comment":"new MongoDB.MinKey()","btype":255,"_id":"5269796fab3df2f90b000019"}

MongoDB and node.js (Part 3): Looking at Query Results: It’s Complicated

Now that the data is stored in MongoDB (http://realprogrammer.wordpress.com/2013/03/27/mongodb-and-node-js-part-2-insertion-log-examination/), let’s get it out again through queries and examine the queries’ results.

Before actually running the queries, a little detour is in order. This is about printing an object on the console directly and printing it after applying JSON.stringify(). Here is an example (running in a node.js shell):

> console.log("MongoDB")
MongoDB
undefined
> console.log(JSON.stringify("MongoDB"))
"MongoDB"
undefined

Both cases work out fine as they should. So no surprise there.

Let’s try something more interesting:

> console.log(NaN)
NaN
undefined
> console.log(JSON.stringify(NaN))
null
undefined

Now that’s a bummer. The reason this is a bummer is two-fold:

  • In Part 2 of this blog series we painstakingly made sure that all possible JavaScript types, including the constants, are stored into MongoDB. Retrieving those means that JSON.stringify() must not be in the code path accessing MongoDB in order to not introduce errors as just shown.
  • Furthermore, the underlying BSON implementation of the node.js driver of MongoDB implements a few toJSON() functions that are actually used by JSON.stringify() as it honors overwritten functions.

The first issue can be resolved by writing a separate function that encapsulates JSON.stringify() and implements the “replacer” function (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify).

The second issue means that the node.js driver in MongoDB decided to represent some of the BSON types differently when converting to a string. If not converted, the type is represented differently. So there is an inherent difference between the structure as retrieved from MongoDB and the structure as produced by JSON.stringify().

As a developer you have to decide to avoid JSON.stringify() or to embrace it (meaning, writing a wrapper function that works for all cases).

So, with all that said, the next installment of this blog series will retrieve all data that has been stored in Part 2 and print it out twice, with console.log() as well as console.log(JSON.stringify()).