Document Projection (Part 2): Definition

What does projection in context of JSON structures or documents actually mean? What should the definition of “projection” be? There are several possibilities discussed next.

Document Projection: Complete Branch

Projection in JSON is projecting a branch of the JSON data structure tree, not projecting a single object or value. To request a projection, a property (projection) path in dot notation could be used (and actually is in many systems). The result of a projection is a valid JSON document containing the specified branch(es).

An example document is

{"a": {"b": {"c": 3, "d": 4, "e": 5}}}

Let’s go through a series of projections in the following.

  • Projection path: “a.b.c”
  • Result: {“a”: {“b”: {“c”: 3}}}
  • Projection path: “a.b”
  • Result: {“a”: {“b”: {“c”: 3, “d”: 4, “e”: 5}}}
  • Projection path: “a.e”
  • Result: {}

The result contains the full path of the projection (or more, but not less). If the requested projection path does not exist, the result is the empty document as none of its properties matches the projection path. The empty projection path “” is not defined, meaning, a projection must name at least one property, and that will be a top-level property in a document.

Several branches can be projected concurrently.

  • Projection paths: “a.b.c”, “a.b.d”
  • Result: {“a”: {“b”: {“c”: 3, “d”: 4}}}

The resulting document contains the combination of all of the branches that result in a valid projection. Redundant projection path specification is possible if one of the projection paths is a sub-path of another one. However, the result document is the same if redundancy is present or absent.

Document Projection: Partial Branch

It might be possible that the whole projection path does not exist, but a part of it. In this case it is a possibility to add the existing result up to that point (MongoDB follows this approach). This results in partial paths whereby the value of their last property is the empty document.

For example, “a.b.f” would result in {“a”: {“b”: {}}}. “a” and “b” exist in the example document, “f”, however, does not.

In my opinion, while possibly useful in some cases, I would not make this the default or standard definition as a result is returned that is incomplete and I could argue that it is in fact incorrect since the value of “b” is not the empty document (I could envision a configuration setting that provides these partial branches if needed).

Document Projection: Value

Wait a minute, why does the result document have to consist of full paths?

The reason is based on the implicit restriction on JSON documents that there can be only one property of a given name on the same level in a document. “Implicit” because the JSON definition (http://json.org/) does not mandate the restriction, but many implementations do: property names on the same level of embedding have to be unique.

For example:

{"x": {"b": {"c": 3, "d": 4}}, 
 "y": {"e": {"c": 3, "d": 4}}}

is a perfectly valid document where the property names are unique on every level. So let’s get back to projection and let’s for a moment assume that projection actually returns the value at the end of the path, omitting the path to the property value itself. So,

  • Projection path: “x.b.c”
  • Result: {“c”: 3}

So far so good.

  • Projection paths: “x.b.c”, “y.e.c”
  • Result: ?

What should the result be? The basic assumption is that a projection on a document returns a document. But “x.b.c” and “y.e.c” both return {“c”: 3} as separate documents, but not one document.

  • One possible result could be an array with two documents. But arrays are in general not considered valid top level documents (again, the JSON definition would allow that).
  • To mitigate that, the array could be the value of a property: {“result”: [{“c”: 3}, {“c”: 3}]}. But this would conflict with a document that happens to have a property “result” of type array with two same documents in it.
  • Or, the two documents would have to be embedded in a third one with special names, like {“1”: {“c”: 3}, “2”: {“c”: 3}}. But then, the original document does not have the properties “1” or “2”.

Based on this discussion having projection results being full paths is simpler and direct.

Projection – Result Correspondence Principle

There is also another argument from the user’s viewpoint. If a user wants to project “x.b.c”, then the user might want to access the result document after the query returns with “x.b.c” as the access path. From this viewpoint, the path in the query and the path in the result document should match and not require access path transformation.

Array Projection: Complete Access Path

Documents may contain arrays as well as array of arrays, arrays of objects of arrays, etc., in principle any of these composite types can be on any level of the document. Projection therefore has to be defined on arrays also, not just documents.

The principle of project paths is extended to include array index specification. For example, let’s consider this document:

{"a": [{"a1": 1}, {"a2": 2}], 
 "b": {"c": [{"c1": 3}, {"c2": 4}, {"c3": 5}]}, 
 "d": [6, 7]}

Let’s do a few projections (arrays are 0-index based):

  • Projection path: a[0]
  • Result: {“a”: [{“a1”: 1}]}
  • Projection path: b.c[1]
  • Result: {“b”: {“c”: [“c2”: 4]}}
  • Projection paths: a[1], b.c[2].c3
  • Result: {“a”: [{“a2”: 2}], “b”: {“c”: [{“c3”: 5}]}}
  • Projection path: a[7]
  • Result: {}

Like in the case of documents, full paths are requested and full paths are returned, with several paths possible. A projection path referring to a non-existing property will not contribute to the result.

So far, so good, except that the results do not yet conform to the “Projection – Result Correspondence” principle from above: the projection “a[1]” resulted in a correct document, but that result document cannot be accessed with “a[1]” to obtain the value.

Array Projection: Padding

In order to support the “Projection – Result Correspondence” principle array results can be padded with the value “null”. For example:

  • Projection paths: a[1], b.c[2].c3
  • Result: {“a”: [null, {“a2”: 2}], “b”: {“c”: [null, null, {“c3”: 5}]}}

Now it is possible to access the result with “a[1]” or “b.c[2].c3” in order to obtain the proper results. From a user’s perspective this is great as again the paths used to specify the projection can be used to retrieve the values.

Array Projection: Scalar Values

Scalar values in arrays do not pose a specific problem:

  • Projection paths: a[1], d[1], d[2]
  • Result: {“a”: [null, {“a2”: 2}], “d”: [null, 7]}

And their access can be accomplished using the projection paths.

Summary

Initially I thought projection is a straight forward function and not worth a discussion in context of document-oriented databases; but then it turned out to be not that clear cut. Nevertheless, the above is a starting point for a strict rationalization of projection in document-oriented databases based on the JSON data model.

Advertisement

Document Projection (Part 1): MongoDB

This blog reviews some of the projection functionality that the MongoDB query interface provides. The emphasis is comparing projection of embedded object properties with projection of embedded array elements. Those are not symmetric, as the examples will show, and that is surprising and rather unexpected.

Test Data Set

The initial test data set contains three documents:

{}
{"a": 1, "b": 2}
{"c": {"d": 3, "e": 4, "f": {"g": 5, "h": 6}}}

The empty document is the control document, and two documents have properties whereby one of those has several levels of embedding. The test data set, contained in a file “td.txt”, is loaded into MongoDB as follows:

mongoimport -d projection -c proj td.txt

Projecting Document Properties

Let’s observe a few projection queries:

> db.proj.find({}, {_id: 0, a: 1})
{  }
{ "a" : 1 }
{  }

In general, to make the result less verbose, the property “_id” is suppressed. The query asked for property “a” and for each document the property “a” is returned (whereby two documents do not contain “a”, so the resulting documents are empty).

> db.proj.find({}, {_id: 0, "c.e": 1})
{  }
{  }
{ "c" : { "e" : 4 } }

Reaching into documents is done by using the dot notation. “c.e” asks for the property “e” in “c”. The result contains the whole document structure starting at “c” all the way to “e”. Alternatively MongoDB could have returned {“e”: 4} only, but then the result would not correspond to the dot notation in the query.

> db.proj.find({}, {_id: 0, "c.f": 1})
{  }
{  }
{ "c" : { "f" : { "g" : 5, "h" : 6 } } }

No surprise here as the result contains the property “c.f”.

> db.proj.find({}, {_id: 0, "c.f.h": 1})
{  }
{  }
{ "c" : { "f" : { "h" : 6 } } }

Reaching further into the document works as expected also.

> db.proj.find({}, {_id: 0, a: 1, c: 1})
{  }
{ "a" : 1 }
{ "c" : { "d" : 3, "e" : 4, "f" : { "g" : 5, "h" : 6 } } }

Asking for different properties returns all of those for each document.

> db.proj.find({}, {_id:0, "c.e":1, "c.f.h":1})
{  }
{  }
{ "c" : { "e" : 4, "f" : { "h" : 6 } } }

Asking for different properties in the same embedded document returns a combination of the properties, not separate properties. This makes sense as in each document a property with a given name can only appear once.

> db.proj.find({}, {_id: 0, "c.g": 1})
{  }
{  }
{ "c" : {  } }

This query asks for an embedded property that does not exist. However, the properties on the path to that missing property are actually included in the result. This is surprising to me as I would expect that if a property does not exist, no property is included in the result at all, especially not a partial property, so to say. In this case, the path in the query “c.g” does not match any path in the result.

Projecting Array Element Properties

First of all, another document is added to the test data set that contains arrays and nested document that in turn contain an array:

{"x": [{"y": 10}, {"w": [{"z": 11}, {"v": 12}]}]}

So a total of 4 documents are in the test data set now.

Projection of arrays is done using the “$slice” operator described here: http://docs.mongodb.org/manual/reference/operator/projection/slice/ (in addition to predicate-driven selection/projection using “$” or “$elemMatch”). Let’s try.

> db.proj.find({}, {_id: 0, "x": {$slice: 1}})
{  }
{ "a" : 1, "b" : 2 }
{ "c" : { "d" : 3, "e" : 4, "f" : { "g" : 5, "h" : 6 } } }
{ "x" : [ { "y" : 10 } ] }

This asks for one element from the array “x”, and indeed, the first element is returned. However, all properties of every document not containing “x” are returned also. This is surprising and in contrast to the behavior when projecting properties of embedded documents.

> db.proj.find({x: {$exists: true}}, {_id: 0, "x": {$slice: 1}})
{ "x" : [ { "y" : 10 } ] }

The “$exists” operator limits the documents to those that contain “x” only. This is still not the same semantics as in the document property projection, but closer.

> db.proj.find({x: {$exists: true}}, {_id: 0, "x": {$slice: [1, 1]}})
{ "x" : [ { "w" : [ { "z" : 11 }, { "v" : 12 } ] } ] }

This selects the second array element (the first “1” indicates the number of skips, and the second “1” indicates how many elements should be selected.

This is different from selecting the second property, in my opinion, as in case of projecting the second property it would be important to see in the result that the second property was projected. This definitely debatable, by in analogy to projecting embedded document properties, the result would have to reflect the query.

Let’s try to select the second element of “w”. This requires reaching into the array on the first level.

> db.proj.find({x: {$exists: true}}, {_id: 0, "x.1.w": {$slice: [1,1]}})
{ "x" : [ { "y" : 10 }, { "w" : [ { "z" : 11 }, { "v" : 12 } ] } ] }

The approach using the dot notation fails. The query does not honor “x.1.w”, specifying: project from “w”, which is the second array element “x.1”. However, the interface is not giving an error, either, as it probably should?

> db.proj.find({x: {$exists: true}}, {_id: 0, "x.w": {$slice: [1,1]}})
{ "x" : [ { "y" : 10 }, { "w" : [ { "v" : 12 } ] } ] }

This works. MongoDB seem to automatically interpret this correctly. However, the first array element of “x” is returned also, again in contrast to the approach for document properties where properties that are not on they path will not be returned.

But what if “x” would contain an additional array element with property “w”? Let’s add this document:

{"x": [{"y": 10}, {"w": [{"z": 11}, {"v": 12}]}, {"w": [{"z": 13}, {"v": 14}]}]}

Now 5 documents are in the test data set.

> db.proj.find({x:{$exists: true}}, {_id: 0, "x.w": {$slice: [1,1]}})
{ "x" : [ { "y" : 10 }, { "w" : [ { "v" : 12 } ] } ] }
{ "x" : [ { "y" : 10 }, { "w" : [ { "v" : 12 } ] }, { "w" : [ { "v" : 14 } ] } ] }

Turns out, MongoDB selects all documents that contain a property “w”, and from those the second element. This looks reasonable.

But, how is the second element of the first “w” selected? I think multi-level projection in context of arrays is not possible at this point: https://jira.mongodb.org/browse/SERVER-831.

> db.proj.find({x:{$exists: true}}, {_id: 0, "x.q": {$slice: [1,1]}})
{ "x" : [ { "y" : 10 }, { "w" : [ { "z" : 11 }, { "v" : 12 } ] } ] }
{ "x" : [ { "y" : 10 }, { "w" : [ { "z" : 11 }, { "v" : 12 } ] }, { "w" : [ { "z" : 13}, { "v" : 14 } ] } ] }
{ "x" : [ { "y" : 10 }, { "w" : [ { "z" : 11 }, { "v" : 12 } ] }, { "a" : 20 } ] }

Asking for an non-existing property returns the complete array, again, in contrast to the analogous query in embedded documents.

I’ll stop here at examining projecting array elements as clearly there is limited support for it as this point in time.

Summary

Surprisingly, at least for me, MongoDB does not follow the same design rules when projecting properties from embedded documents compared with projecting elements from embedded arrays. Not only is the behavior different

  1. Properties of documents not containing the requested array projection are returned
  2. Properties not on the path to the projected element are returned

but also multi-level projection does not work in a straight forward way: it is not possible to use dot notation to reach into nested arrays for projection.

Querying for non-existing properties results in partial or incorrect results, in my opinion. Of course, there are different viewpoints possible on this behavior, and for sure it warrants further discussion.

MongoDB and node.js (Part 4): Query Results

Now that the data is stored in MongoDB (https://realprogrammer.wordpress.com/2013/03/27/mongodb-and-node-js-part-2-insertion-log-examination/), let’s get it out again through queries and examine the queries’ results.

Part 3 of this series (https://realprogrammer.wordpress.com/2013/05/29/mongodb-and-node-js-part-3-looking-at-query-results-its-complicated/) introduced the reason why the results are printed with and without using JSON.stringify() and outlined why in context of MongoDB and node.js using JSON.stringify() might be not such a good idea.

There are as many queries generated as there are BSON types and the the BSON types are queried in ascending order. The printout after “===>” is generated by “console.log(<query_result_document>)” and the printout after “—>” is generated by “console.log(JSON.stringify(<query_result_document>))”.

Some interpretation will follow in Part 5 of this series. Here are the results uninterpreted for now:

===> { x: 123.123,
  comment: 'new MongoDB.Double(123.123)',
  btype: 1,
  _id: 5269796fab3df2f90b000001 }
---> {"x":123.123,"comment":"new MongoDB.Double(123.123)","btype":1,"_id":"5269796fab3df2f90b000001"}

===> { x: 456.456,
  comment: '456.456',
  btype: 1,
  _id: 5269796fab3df2f90b000002 }
---> {"x":456.456,"comment":"456.456","btype":1,"_id":"5269796fab3df2f90b000002"}

===> { x: NaN,
  comment: 'Number.NaN',
  btype: 1,
  _id: 5269796fab3df2f90b00001c }
---> {"x":null,"comment":"Number.NaN","btype":1,"_id":"5269796fab3df2f90b00001c"}

===> { x: Infinity,
  comment: 'Infinity',
  btype: 1,
  _id: 5269796fab3df2f90b00001d }
---> {"x":null,"comment":"Infinity","btype":1,"_id":"5269796fab3df2f90b00001d"}

===> { x: Infinity,
  comment: 'Number.POSITIVE_INFINITY',
  btype: 1,
  _id: 5269796fab3df2f90b00001e }
---> {"x":null,"comment":"Number.POSITIVE_INFINITY","btype":1,"_id":"5269796fab3df2f90b00001e"}

===> { x: -Infinity,
  comment: 'Number.NEGATIVE_INFINITY',
  btype: 1,
  _id: 5269796fab3df2f90b00001f }
---> {"x":null,"comment":"Number.NEGATIVE_INFINITY","btype":1,"_id":"5269796fab3df2f90b00001f"}

===> { x: 5e-324,
  comment: 'MIN_VALUE',
  btype: 1,
  _id: 5269796fab3df2f90b000020 }
---> {"x":5e-324,"comment":"MIN_VALUE","btype":1,"_id":"5269796fab3df2f90b000020"}

===> { x: 1.7976931348623157e+308,
  comment: 'MAX_VALUE',
  btype: 1,
  _id: 5269796fab3df2f90b000021 }
---> {"x":1.7976931348623157e+308,"comment":"MAX_VALUE","btype":1,"_id":"5269796fab3df2f90b000021"}

===> { x: 'abc',
  comment: 'abc',
  btype: 2,
  _id: 5269796fab3df2f90b000003 }
---> {"x":"abc","comment":"abc","btype":2,"_id":"5269796fab3df2f90b000003"}

===> { x: { z: 5 },
  comment: '{"z": 5}',
  btype: 3,
  _id: 5269796fab3df2f90b000004 }
---> {"x":{"z":5},"comment":"{\"z\": 5}","btype":3,"_id":"5269796fab3df2f90b000004"}

===> { x: [ { y: 4 }, { z: 5 } ],
  comment: '[{"y": 4}, {"z": 5}]',
  btype: 3,
  _id: 5269796fab3df2f90b000006 }
---> {"x":[{"y":4},{"z":5}],"comment":"[{\"y\": 4}, {\"z\": 5}]","btype":3,"_id":"5269796fab3df2f90b000006"}

===> { x: 
   { _bsontype: 'DBRef',
     namespace: 'types_node',
     oid: '5040dc5d40b67c681d000001',
     db: 'types' },
  comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types")',
  btype: 3,
  _id: 5269796fab3df2f90b00000f }
---> {"x":{"$ref":"types_node","$id":"5040dc5d40b67c681d000001","$db":"types"},"comment":"new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\", \"types\")","btype":3,"_id":"5269796fab3df2f90b00000f"}

===> { x: 
   { _bsontype: 'DBRef',
     namespace: 'types_node',
     oid: '5040dc5d40b67c681d000001',
     db: undefined },
  comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001")',
  btype: 3,
  _id: 5269796fab3df2f90b000010 }
---> {"x":{"$ref":"types_node","$id":"5040dc5d40b67c681d000001","$db":""},"comment":"new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\")","btype":3,"_id":"5269796fab3df2f90b000010"}

===> { x: 
   { _bsontype: 'DBRef',
     namespace: 'types_node',
     oid: '5040dc5d40b67c681d000001',
     db: 'types' },
  comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"}',
  btype: 3,
  _id: 5269796fab3df2f90b000011 }
---> {"x":{"$ref":"types_node","$id":"5040dc5d40b67c681d000001","$db":"types"},"comment":"{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\", \"$db\": \"types\"}","btype":3,"_id":"5269796fab3df2f90b000011"}

===> { x: 
   { _bsontype: 'DBRef',
     namespace: 'types_node',
     oid: '5040dc5d40b67c681d000001',
     db: undefined },
  comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"}',
  btype: 3,
  _id: 5269796fab3df2f90b000012 }
---> {"x":{"$ref":"types_node","$id":"5040dc5d40b67c681d000001","$db":""},"comment":"{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\"}","btype":3,"_id":"5269796fab3df2f90b000012"}

===> { x: 
   { _bsontype: 'Binary',
     sub_type: 0,
     position: 6,
     buffer: <Buffer 62 69 6e 61 72 79> },
  comment: 'new MongoDB.Binary("binary")',
  btype: 5,
  _id: 5269796fab3df2f90b000007 }
---> {"x":"YmluYXJ5","comment":"new MongoDB.Binary(\"binary\")","btype":5,"_id":"5269796fab3df2f90b000007"}

===> { x: 5040dc5d40b67c681d000001,
  comment: 'new MongoDB.ObjectID("5040dc5d40b67c681d000001")',
  btype: 7,
  _id: 5269796fab3df2f90b000008 }
---> {"x":"5040dc5d40b67c681d000001","comment":"new MongoDB.ObjectID(\"5040dc5d40b67c681d000001\")","btype":7,"_id":"5269796fab3df2f90b000008"}

===> { x: false,
  comment: 'false',
  btype: 8,
  _id: 5269796fab3df2f90b000009 }
---> {"x":false,"comment":"false","btype":8,"_id":"5269796fab3df2f90b000009"}

===> { x: true,
  comment: 'true',
  btype: 8,
  _id: 5269796fab3df2f90b00000a }
---> {"x":true,"comment":"true","btype":8,"_id":"5269796fab3df2f90b00000a"}

===> { x: Fri Aug 31 2012 05:13:14 GMT-0700 (PDT),
  comment: 'new Date("2012-08-31 12:13:14:156 UTC")',
  btype: 9,
  _id: 5269796fab3df2f90b00000b }
---> {"x":"2012-08-31T12:13:14.156Z","comment":"new Date(\"2012-08-31 12:13:14:156 UTC\")","btype":9,"_id":"5269796fab3df2f90b00000b"}

===> { x: null,
  comment: 'null',
  btype: 10,
  _id: 5269796fab3df2f90b00000c }
---> {"x":null,"comment":"null","btype":10,"_id":"5269796fab3df2f90b00000c"}

===> { x: null,
  comment: 'undefined',
  btype: 10,
  _id: 5269796fab3df2f90b00001b }
---> {"x":null,"comment":"undefined","btype":10,"_id":"5269796fab3df2f90b00001b"}

===> { x: /abc/,
  comment: 'new RegExp("abc")',
  btype: 11,
  _id: 5269796fab3df2f90b00000d }
---> {"x":{},"comment":"new RegExp(\"abc\")","btype":11,"_id":"5269796fab3df2f90b00000d"}

===> { x: /abc/i,
  comment: 'new RegExp("abc", "i")',
  btype: 11,
  _id: 5269796fab3df2f90b00000e }
---> {"x":{},"comment":"new RegExp(\"abc\", \"i\")","btype":11,"_id":"5269796fab3df2f90b00000e"}

===> { x: { _bsontype: 'Code', code: 'function () {}', scope: {} },
  comment: 'new MongoDB.Code("function () {}")',
  btype: 13,
  _id: 5269796fab3df2f90b000013 }
---> {"x":{"scope":{},"code":"function () {}"},"comment":"new MongoDB.Code(\"function () {}\")","btype":13,"_id":"5269796fab3df2f90b000013"}

===> { x: def15,
  comment: 'new MongoDB.Symbol("def15")',
  btype: 14,
  _id: 5269796fab3df2f90b000014 }
---> {"x":"def15","comment":"new MongoDB.Symbol(\"def15\")","btype":14,"_id":"5269796fab3df2f90b000014"}

===> { x: { _bsontype: 'Code', code: 'function (a) {}', scope: { a: 4 } },
  comment: ' new MongoDB.Code("function (a) {}", {"a": 4})',
  btype: 15,
  _id: 5269796fab3df2f90b000015 }
---> {"x":{"scope":{"a":4},"code":"function (a) {}"},"comment":" new MongoDB.Code(\"function (a) {}\", {\"a\": 4})","btype":15,"_id":"5269796fab3df2f90b000015"}

===> { x: [ 9, 8, 7 ],
  comment: '[9, 8, 7]',
  btype: 16,
  _id: 5269796fab3df2f90b000005 }
---> {"x":[9,8,7],"comment":"[9, 8, 7]","btype":16,"_id":"5269796fab3df2f90b000005"}

===> { x: 123456,
  comment: '123456',
  btype: 16,
  _id: 5269796fab3df2f90b000016 }
---> {"x":123456,"comment":"123456","btype":16,"_id":"5269796fab3df2f90b000016"}

===> { x: { _bsontype: 'Timestamp', low_: 1, high_: 2 },
  comment: 'new MongoDB.Timestamp(1, 2)',
  btype: 17,
  _id: 5269796fab3df2f90b000017 }
---> {"x":"8589934593","comment":"new MongoDB.Timestamp(1, 2)","btype":17,"_id":"5269796fab3df2f90b000017"}

===> { x: 987,
  comment: 'new MongoDB.Long("987")',
  btype: 18,
  _id: 5269796fab3df2f90b000018 }
---> {"x":987,"comment":"new MongoDB.Long(\"987\")","btype":18,"_id":"5269796fab3df2f90b000018"}

===> { x: { _bsontype: 'MaxKey' },
  comment: 'new MongoDB.MaxKey()',
  btype: 127,
  _id: 5269796fab3df2f90b00001a }
---> {"x":{"_bsontype":"MaxKey"},"comment":"new MongoDB.MaxKey()","btype":127,"_id":"5269796fab3df2f90b00001a"}

===> { x: { _bsontype: 'MaxKey' },
  comment: 'new MongoDB.MaxKey()',
  btype: 127,
  _id: 5269796fab3df2f90b00001a }
---> {"x":{"_bsontype":"MaxKey"},"comment":"new MongoDB.MaxKey()","btype":127,"_id":"5269796fab3df2f90b00001a"}

===> { x: { _bsontype: 'MinKey' },
  comment: 'new MongoDB.MinKey()',
  btype: 255,
  _id: 5269796fab3df2f90b000019 }
---> {"x":{"_bsontype":"MinKey"},"comment":"new MongoDB.MinKey()","btype":255,"_id":"5269796fab3df2f90b000019"}

===> { x: { _bsontype: 'MinKey' },
  comment: 'new MongoDB.MinKey()',
  btype: 255,
  _id: 5269796fab3df2f90b000019 }
---> {"x":{"_bsontype":"MinKey"},"comment":"new MongoDB.MinKey()","btype":255,"_id":"5269796fab3df2f90b000019"}

MongoDB and node.js (Part 3): Looking at Query Results: It’s Complicated

Now that the data is stored in MongoDB (https://realprogrammer.wordpress.com/2013/03/27/mongodb-and-node-js-part-2-insertion-log-examination/), let’s get it out again through queries and examine the queries’ results.

Before actually running the queries, a little detour is in order. This is about printing an object on the console directly and printing it after applying JSON.stringify(). Here is an example (running in a node.js shell):

> console.log("MongoDB")
MongoDB
undefined
> console.log(JSON.stringify("MongoDB"))
"MongoDB"
undefined

Both cases work out fine as they should. So no surprise there.

Let’s try something more interesting:

> console.log(NaN)
NaN
undefined
> console.log(JSON.stringify(NaN))
null
undefined

Now that’s a bummer. The reason this is a bummer is two-fold:

  • In Part 2 of this blog series we painstakingly made sure that all possible JavaScript types, including the constants, are stored into MongoDB. Retrieving those means that JSON.stringify() must not be in the code path accessing MongoDB in order to not introduce errors as just shown.
  • Furthermore, the underlying BSON implementation of the node.js driver of MongoDB implements a few toJSON() functions that are actually used by JSON.stringify() as it honors overwritten functions.

The first issue can be resolved by writing a separate function that encapsulates JSON.stringify() and implements the “replacer” function (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify).

The second issue means that the node.js driver in MongoDB decided to represent some of the BSON types differently when converting to a string. If not converted, the type is represented differently. So there is an inherent difference between the structure as retrieved from MongoDB and the structure as produced by JSON.stringify().

As a developer you have to decide to avoid JSON.stringify() or to embrace it (meaning, writing a wrapper function that works for all cases).

So, with all that said, the next installment of this blog series will retrieve all data that has been stored in Part 2 and print it out twice, with console.log() as well as console.log(JSON.stringify()).

MongoDB and node.js (Part 2): Insertion Log Examination

One important first step is to observe the documents as they get written into MongoDB when storing the test documents from the previous post (https://realprogrammer.wordpress.com/2013/02/10/mongodb-and-node-js-part-1-list-of-documents-for-all-types/). This establishes the base approach of transferring documents defined in JavaScript to documents required as BSON structures.

Type Systems Impedance Mismatch

There is a mismatch in type systems: JavaScript and BSON have some types in common, but not all. Because of this impedance mismatch there must be a mechanism that overcomes it. The following listing discusses those.

Insertion Log Examination

First of all, the JavaScript code of the previous blog drops the target collection and the logging states the outcome:

Drop Collection Result: true

Next, for each document in the previous blog, we show here what the document looks like when written to MongoDB and add a discussion as needed. This can be very boring to read as it is an account of an actual execution; however, please search for specific types if you are only interested in those.

BSON Double

Definition:

{"x": new MongoDB.Double(123.123),
 "comment": "new MongoDB.Double(123.123)",
 "btype": 1}

Written As:

[ { x: { _bsontype: 'Double', value: 123.123 },
    comment: 'new MongoDB.Double(123.123)',
    btype: 1,
    _id: 512675b5508942d427000001 } ]

Comment: There are a few items that require discussion. First, a property “_id” was added by MongoDB. This is the case for all subsequent documents also.

Second, the constructor call “new MongoDB.Double()” was translated into an object that has a property “_bsontype” and a “value”. The “_bsontype” contains the textual specification of the BSON type at hand, and the value contains the value as it was provided to the constructor. This shows that there is no direct translation from a BSON Double to a JavaScript type.

JavaScript Number

Definition:

{"x": 456.456,
 "comment": "456.456",
 "btype": 1}

Written As:

[ { x: 456.456,
    comment: '456.456',
    btype: 1,
    _id: 512675b5508942d427000002 } ]

Comment: This is a direct translation from JavaScript to BSON as no constructor was involved.

JavaScript String

Definition:

{"x": "abc",
 "comment": "abc",
 "btype": 2}

Written As:

[ { x: 'abc',
    comment: 'abc',
    btype: 2,
    _id: 512675b5508942d427000003 } ]

Comment: This is also a direct translation from JavaScript to BSON.

JavaScript Object

Definition:

{"x": {"z": 5},
 "comment": "{\"z\": 5}",
 "btype": 3}

Written As:

[ { x: { z: 5 },
    comment: '{"z": 5}',
    btype: 3,
    _id: 512675b5508942d427000004 } ]

Comment: Objects in JavaScript are directly translated into BSON.

JavaScript Array

Definition:

{"x": [9, 8, 7],
 "comment": "[9, 8, 7]",
 "btype": 16}

Written As:

[ { x: [ 9, 8, 7 ],
    comment: '[9, 8, 7]',
    btype: 16,
    _id: 512675b5508942d427000005 } ]

Comment: Arrays are translated directly into BSON also.

Definition:

{"x": [
        {"y": 4},
        {"z": 5}
      ], 
 "comment": "[{\"y\": 4}, {\"z\": 5}]",
 "btype": 3}

Written As:

[ { x: [ [Object], [Object] ],
    comment: '[{"y": 4}, {"z": 5}]',
    btype: 3,
    _id: 512675b5508942d427000006 } ]

Comment: Again, an array directly translated.

BSON Binary

Definition:

{"x": new MongoDB.Binary("binary"),
 "comment": "new MongoDB.Binary(\"binary\")",
 "btype": 5}

Written As:

[ { x:
    { _bsontype: 'Binary',
      sub_type: 0,
      position: 6,
      buffer: <Buffer 62 69 6e 61 72 79> },
    comment: 'new MongoDB.Binary("binary")',
    btype: 5,
    _id: 512675b5508942d427000007 } ]

Comment: Binary data types are created with a constructor before being passed on to MongoDB.

BSON ObjectId

Definition:

{"x": new MongoDB.ObjectID("5040dc5d40b67c681d000001"),
 "comment": "new MongoDB.ObjectID(\"5040dc5d40b67c681d000001\")",
 "btype": 7}

Written As:

[ { x: 5040dc5d40b67c681d000001,
    comment: 'new MongoDB.ObjectID("5040dc5d40b67c681d000001")',
    btype: 7,
    _id: 512675b5508942d427000008 } ]

Comment: Even though a constructor is given, the object written is not marked with “_bsontype”.

JavaScript Boolean

Definition:

{"x": false,
 "comment": "false",
 "btype": 8}

Written As:

[ { x: false,
    comment: 'false',
    btype: 8,
    _id: 512675b5508942d427000009 } ]

Comment: The Boolean gets translated directly.

Definition:

{"x": true,
 "comment": "true",
 "btype": 8}

Written As:

[ { x: true,
    comment: 'true',
    btype: 8,
    _id: 512675b5508942d42700000a } ]

Comment: The Boolean gets translated directly.

JavaScript Date

Definition:

{"x": new Date("2012-08-31 12:13:14:156 UTC"),
 "comment": "new Date(\"2012-08-31 12:13:14:156 UTC\")",
 "btype": 9}

Written As:

[ { x: Fri Aug 31 2012 05:13:14 GMT-0700 (Pacific Daylight Time),
    comment: 'new Date("2012-08-31 12:13:14:156 UTC")',
    btype: 9,
    _id: 512675b5508942d42700000b } ]

Comment: Written as Date-formatted text.

JavaScript Null

Definition:

{"x": null,
 "comment": "null",
 "btype": 10}

Written As:

[ { x: null,
    comment: 'null',
    btype: 10,
    _id: 512675b5508942d42700000c } ]

Comment: “null” is directly written.

JavaScript Regular Expression

Definition:

{"x": new RegExp("abc"),
 "comment": "new RegExp(\"abc\")",
 "btype": 11}

Written As:

[ { x: /abc/,
    comment: 'new RegExp("abc")',
    btype: 11,
    _id: 512675b5508942d42700000d } ]

Comment: The regular expression is written as regular expression text.

Definition:

{"x": new RegExp("abc", "i"),
 "comment": "new RegExp(\"abc\", \"i\")",
 "btype": 11}

Written As:

[ { x: /abc/i,
    comment: 'new RegExp("abc", "i")',
    btype: 11,
    _id: 512675b5508942d42700000e } ]

Comment: The regular expression is written as regular expression text.

BSON DBRef

There are various ways of defining a BSON DBRef. All possibilities are discussed next.

Definition:

{"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types"),
 "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\", \"types\")",
 "btype": 3}

Written As:

[ { x:
    { _bsontype: 'DBRef',
      namespace: 'types_node',
      oid: '5040dc5d40b67c681d000001',
      db: 'types' },
    comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types")',
    btype: 3,
    _id: 512675b5508942d42700000f } ]

Comment: In this case the constructor creates a complete object, including “_bsontype”.

Definition:

{"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001"),
 "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\")",
 "btype": 3}

Written As:

[ { x:
    { _bsontype: 'DBRef',
      namespace: 'types_node',
      oid: '5040dc5d40b67c681d000001',
      db: undefined },
    comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001")',
    btype: 3,
    _id: 512675b5508942d427000010 } ]

Comment: This is like the case before, except, the database is not defined.

Definition:

{"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"},
 "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\", \"$db\": \"types\"}",
 "btype": 3}

Written As:

[ { x: { '$ref': 'types_node', '$id': '5040dc5d40b67c681d000001' },
    comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"}',
    btype: 3,
    _id: 512675b5508942d427000012 } ]

Comment: This is an alternative way to create a DBRef object in MongoDB. Instead of using the constructor, a document can be directly defined, however, the property names have to match precisely what MongoDB is expecting.

Definition:

{"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"},
 "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\"}",
 "btype": 3}

Written As:

[ { x:
    { '$ref': 'types_node', '$id': '5040dc5d40b67c681d000001', '$db': 'types' },
    comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"}',
    btype: 3,
    _id: 512675b5508942d427000011 } ]

Comment: Like before, this is a direct way to store a DBRef, this time with a database specification.

BSON Code

BSON code comes in two forms, with and without scope. Both are discussed next.

Definition:

{"x": new MongoDB.Code("function () {}"),
 "comment": "new MongoDB.Code(\"function () {}\")",
 "btype": 13}

Written As:

[ { x: { _bsontype: 'Code', code: 'function () {}', scope: {} },
    comment: 'new MongoDB.Code("function () {}")',
    btype: 13,
    _id: 512675b5508942d427000013 } ]

Comment: Code is created as an object with a constructor, hence the “_bsontype” property.

Definition:

{"x": new MongoDB.Code("function (a) {}", {"a": 4}),
 "comment": "new MongoDB.Code(\"function (a) {}\", {\"a\": 4})",
 "btype": 15}

Written As:

[ { x: { _bsontype: 'Code', code: 'function (a) {}', scope: [Object] },
    comment: 'new MongoDB.Code("function (a) {}", {"a": 4})',
    btype: 15,
    _id: 512675b5508942d427000015 } ]

Comment: This case is basically the same as the code without scope.

BSON Symbol

Definition:

{"x": new MongoDB.Symbol("def15"),
 "comment": "new MongoDB.Symbol(\"def15\")",
 "btype": 14}

Written As:

[ { x: def15,
    comment: 'new MongoDB.Symbol("def15")',
    btype: 14,
    _id: 512675b5508942d427000014 } ]

Comment: BSON Symbol is created as object by a constructor. However, it does not have a “_bsontype” as the value of a symbol is converted into a string.

BSON 32-Bit Int

Definition:

{"x": 123456,
 "comment": "123456",
 "btype": 16}

Written As:

[ { x: 123456,
    comment: '123456',
    btype: 16,
    _id: 512675b5508942d427000016 } ]

Comment: A BSON 32-bit int does not require a constructor and maps directly to the corresponding JSON type.

BSON Timestamp

Definition:

{"x": new MongoDB.Timestamp(1, 2),
 "comment": "new MongoDB.Timestamp(1, 2)",
 "btype": 17}

Written As:

[ { x: { _bsontype: 'Timestamp', low_: 1, high_: 2 },
    comment: 'new MongoDB.Timestamp(1, 2)',
    btype: 17,
    _id: 512675b5508942d427000017 } ]

Comment: A BSON Timestamp is created by a constructor as an object. It hence has the “_bsontype” property.

BSON Long

Definition:

{"x": new MongoDB.Long("987"),
 "comment": "new MongoDB.Long(\"987\")",
 "btype": 18}

Written As:

[ { x: { _bsontype: 'Long', low_: 987, high_: 0 },
    comment: 'new MongoDB.Long("987")',
    btype: 18,
    _id: 512675b5508942d427000018 } ]

Comment: The BSON Long type is created as object by a constructor and has the “_bsontype” property.

BSON MinKey and MaxKey

Definition:

{"x": new MongoDB.MinKey(),
 "comment": "MongoDB.MinKey()",
 "btype": 255}
{"x": new MongoDB.MaxKey(),
 "comment": "MongoDB.MaxKey()",
 "btype": 127}

Written As:

[ { x: { _bsontype: 'MinKey' },
    comment: 'MongoDB.MinKey()',
    btype: 255,
    _id: 512675b5508942d427000019 } ]
[ { x: { _bsontype: 'MaxKey' },
    comment: 'MongoDB.MaxKey()',
    btype: 127,
    _id: 512675b5508942d42700001a } ]

Comment: Both, BSON MinKey and MaxKey are created by a constructor as object and have the property “_bsontype”.

JavaScript undefined

Definition:

 {"x": undefined,
  "comment": "undefined",
  "btype": 10}

Written As:

[ { x: undefined,
    comment: 'undefined',
    btype: 10,
    _id: 512675b5508942d42700001b } ]

Comment: The JavaScript ‘undefined’ is directly stored.

JavaScript Number.NaN

Definition:

{"x": Number.NaN,
 "comment": "Number.NaN",
 "btype": 1}

Written As:

[ { x: NaN,
    comment: 'Number.NaN',
    btype: 1,
    _id: 512675b5508942d42700001c } ]

Comment: JavaScript Number.NaN is written as NaN.

JavaScript Infinity, Number.POSITIVE_INFINITY and Number.NEGATIVE_INFINITY

Definition:

{"x": Infinity,
 "comment": "Infinity",
 "btype": 1}
{"x": Number.POSITIVE_INFINITY,
 "comment": "Number.POSITIVE_INFINITY",
 "btype": 1}
{"x": Number.NEGATIVE_INFINITY,
 "comment": "Number.NEGATIVE_INFINITY",
 "btype": 1}

Written As:

[ { x: Infinity,
    comment: 'Infinity',
    btype: 1,
    _id: 512675b5508942d42700001d } ]
[ { x: Infinity,
    comment: 'Number.POSITIVE_INFINITY',
    btype: 1,
    _id: 512675b5508942d42700001e } ]
[ { x: -Infinity,
    comment: 'Number.NEGATIVE_INFINITY',
    btype: 1,
    _id: 512675b5508942d42700001f } ]

Comment: JavaScript Infinity is stored as such. Number.POSITIVE_INFINITY is stored as Infinity and Number.NEGATIVE_INFINITY is stored as -Infinity.

JavaScript Number.MIN_VALUE and Number.MAX_VALUE

Definition:

{"x": Number.MIN_VALUE,
 "comment": "Number.MIN_VALUE",
 "btype": 1}
{"x": Number.MAX_VALUE,
 "comment": "Number.MAX_VALUE",
 "btype": 1}

Written As:

[ { x: 5e-324,
    comment: 'Number.MIN_VALUE',
    btype: 1,
    _id: 512675b5508942d427000020 } ]
[ { x: 1.7976931348623157e+308,
    comment: 'Number.MAX_VALUE',
    btype: 1,
    _id: 512675b5508942d427000021 } ]

Comment: Number.MIN_VALUE and Number.MAX_VALUE are stored as MIN_VALUE and MAX_VALUE respectively.

Summary

This was a very ‘dry’ blog. Nevertheless, when dealing with a database interface, it is important to ensure the proper understanding of how datatypes in the involved type systems map to each other.

It will become extremely relevant when data stored is queried back. A few surprises are waiting for you.

MongoDB and node.js (Part 1): List of Documents for all Types

This blog contains the document set that will be used throughout the blog series.

Document Set Characteristics

The data set is a collection of documents, whereby each document contains three properties:

  • “x”: This property contains a different data value in each document
  • “comment”: this property describes “x” as originally specified. When the document is queried, it is clear what the original specification of “x” was. This is necessary as the node.js driver modifies documents on insert.
  • “btype”: this property contains the BSON type designator. When the document is queried, the type is clear without having to query it explicitly.

Since the blog series is evolving, the set of documents might change as needed in order to point out more specifics; the current set is a good starting point, however.

If there is more than one way to insert a specific type, several documents are included, one document for each possibility. This is to show the possible alternatives exhaustively.

JavaScript, BSON, JSON

Node.js is is implementing server-side JavaScript (based on Google’s V8 Engine). So the data types that are passed back and forth to the MongoDB Node.js driver are JavaScript types.

MongoDB internally operates on BSON. This means that on the way from the Node.js driver to MongoDB JavaScript types are transformed into BSON (and vice versa on the way back). For some BSON types MongoDB provides constructors.

While many equate JavaScript data types structures with JSON, actually JSON is not an equivalent serialization of JavaScript types (more on that issue during the blog series).

Based on this discussion, the test data document set tries to cover all JavaScript types and their variation, plus the JavaScript implementation of BSON types. This will also clarify more over the course of the blog series.

Document Set

The following code is complete in the sense that it runs and inserts the documents into a MongoDB database called “nodeTest”, using all the defaults.

/*global require*/

var MongoDB = require('mongodb');

/*
 Type codes
 ==========
 1 "\x01" e_name double             Floating point
 2 "\x02" e_name string             UTF-8 string
 3 "\x03" e_name document           Embedded document
 4 "\x04" e_name document           Array
 5 "\x05" e_name binary             Binary data

 7 "\x07" e_name (byte*12)          ObjectId
 8 "\x08" e_name "\x00"             Boolean "false"
 8 "\x08" e_name "\x01"             Boolean "true"
 9 "\x09" e_name int64              UTC datetime
 10 "\x0A" e_name Null value
 11 "\x0B" e_name cstring cstring   Regular expression

 13 "\x0D" e_name string            JavaScript code

 15 "\x0F" e_name code_w_s          JavaScript code w/ scope
 16 "\x10" e_name int32             32-bit Integer
 17 "\x11" e_name int64             Timestamp
 18 "\x12" e_name int64             64-bit integer
 255 "\xFF" e_name Min key
 127 "\x7F" e_name Max key

 Deprecated type codes
 =====================
 6 "\x06" e_name                    Undefined — Deprecated
 12 "\x0C" e_name string (byte*12)  DBPointer — Deprecated
 14 "\x0E" e_name string            Symbol — Deprecated

 */

var typeDocuments;

typeDocuments = [
    {"x": new MongoDB.Double(123.123),
        "comment": "new MongoDB.Double(123.123)",
        "btype": 1},
    {"x": 456.456,
        "comment": "456.456",
        "btype": 1},
    {"x": "abc",
        "comment": "abc",
        "btype": 2},
    {"x": {"z": 5},
        "comment": "{\"z\": 5}",
        "btype": 3},
    // this is not type:4
    {"x": [9, 8, 7],
        "comment": "[9, 8, 7]",
        "btype": 16},
    {"x": [
        {"y": 4},
        {"z": 5}
    ], "comment": "[{\"y\": 4}, {\"z\": 5}]",
        "btype": 3},
    {"x": new MongoDB.Binary("binary"),
        "comment": "new MongoDB.Binary(\"binary\")",
        "btype": 5},
    // t:6 deprecated (was 'undefined') - not implemented
    {"x": new MongoDB.ObjectID("5040dc5d40b67c681d000001"),
        "comment": "new MongoDB.ObjectID(\"5040dc5d40b67c681d000001\")",
        "btype": 7},
    {"x": false,
        "comment": "false",
        "btype": 8},
    {"x": true,
        "comment": "true",
        "btype": 8},
    {"x": new Date("2012-08-31 12:13:14:156 UTC"),
        "comment": "new Date(\"2012-08-31 12:13:14:156 UTC\")",
        "btype": 9},
    {"x": null,
        "comment": "null",
        "btype": 10},
    {"x": new RegExp("abc"),
        "comment": "new RegExp(\"abc\")",
        "btype": 11},
    {"x": new RegExp("abc", "i"),
        "comment": "new RegExp(\"abc\", \"i\")",
        "btype": 11},
    // t:12 DBRef deprecated - still implemented
    // this is not type:12
    {"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types"),
        "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\", \"types\")",
        "btype": 3},
    // this is not type:12
    {"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001"),
        "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\")",
        "btype": 3},
    // MongoDB defined JSON serialization (http://docs.mongodb.org/manual/reference/mongodb-extended-json/)
    // this is not type:12
    {"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"},
        "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\", \"$db\": \"types\"}",
        "btype": 3},
    // this is not type:12
    {"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"},
        "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\"}",
        "btype": 3},
    {"x": new MongoDB.Code("function () {}"),
        "comment": "new MongoDB.Code(\"function () {}\")",
        "btype": 13},
    // t:14 Symbol deprecated - still implemented
    {"x": new MongoDB.Symbol("def15"),
        "comment": "new MongoDB.Symbol(\"def15\")",
        "btype": 14},
    {"x": new MongoDB.Code("function (a) {}", {"a": 4}),
        "comment": " new MongoDB.Code(\"function (a) {}\", {\"a\": 4})",
        "btype": 15},
    {"x": 123456,
        "comment": "123456",
        "btype": 16},
    {"x": new MongoDB.Timestamp(1, 2),
        "comment": "new MongoDB.Timestamp(1, 2)",
        "btype": 17},
    {"x": new MongoDB.Long("987"),
        "comment": "new MongoDB.Long(\"987\")",
        "btype": 18},
    {"x": new MongoDB.MinKey(),
        "comment": "new MongoDB.MinKey()",
        "btype": 255},
    {"x": new MongoDB.MaxKey(),
        "comment": "new MongoDB.MaxKey()",
        "btype": 127},
    // ADDITIONAL POSSIBLE VALUES
    // 'undefined' will be converted to 'null'; type will be 'null' (aka 10) also
    {"x": undefined,
        "comment": "undefined",
        "btype": 10},
    {"x": Number.NaN,
        "comment": "Number.NaN",
        "btype": 1},
    {"x": Infinity,
        "comment": "Infinity",
        "btype": 1},
    {"x": Number.POSITIVE_INFINITY,
        "comment": "Number.POSITIVE_INFINITY",
        "btype": 1},
    {"x": Number.NEGATIVE_INFINITY,
        "comment": "Number.NEGATIVE_INFINITY",
        "btype": 1},
    {"x": Number.MIN_VALUE,
        "comment": "MIN_VALUE",
        "btype": 1},
    {"x": Number.MAX_VALUE,
        "comment": "MAX_VALUE",
        "btype": 1}
];

var Db = MongoDB.Db,
    Server = MongoDB.Server;
var db = new Db('nodeTest', new Server("127.0.0.1", 27017,
    {auto_reconnect: false, poolSize: 4}), {native_parser: false, safe: false});

db.open(function (err, db) {
    "use strict";
    db.dropCollection("types_node", function (err, result) {
        var i,
            printLog = function (err, result) {
                if (err) {
                    console.log(err.toString());
                }
                console.log(result);
            };
        if (err) {
            console.log(err.toString());
        }
        console.log("Drop Collection Result: " + result);
        db.collection("types_node", function (err, collection) {
            for (i = 0; i < typeDocuments.length; i = i + 1) {
                collection.insert(typeDocuments[i], {safe: true}, printLog);
            }
        });
    });
});

MongoDB and node.js (Part 0): Rationalization of Data Types

This is the start of a blog series in context of MongoDB and node.js. It is a rationalization of data type handling and transfer between JavaScript (node.js) and MongoDB. The series is very specific to MongoDB and the 10gen supported MongoDB node.js driver.

How did the opportunity for this blog series arise? Basically, my plan was to do a round-trip from JavaScript to MongoDB and back for each data type and constant in JavaScript.

While an easy plan to come up with, executing it turned out to be a little journey. During that journey a few issues arose that I did not expect and the blog series will report on those. Along the way a few JIRAs and GitHub issues were filed and will be reported on in the appropriate places.

Philosophy

I had and will have a clear philosophy in mind throughout this journey:

  • Treating MongoDB as a database. What I mean by that is that if I can store data into MongoDB, I expect that I can query the data and retrieve it again. Especially, when storing a document, I expect to be able to query this document again.
  • Foreign Database ownership. This means that I do not assume to have control over the database and its contents, but assume that a database is given to me. However some user managed to put documents into the database, I should be able to retrieve it.
  • Conventions are Conventions. Often I hear: ‘don’t do that’; or, ‘this is an internal type, don’t use it’. While I as a human might follow this advice, the software (aka MongoDB) might or might now know about it. So if it does not enforce constraints at its interface, then I assume there is no technical need for these constraints in order for MongoDB to work properly.

This philosophy might sound obvious in a database context, but along the steps of the journey I’ll refer back to it at various places as it makes a difference.

Starting Point: Creating documents for all JavaScript Types

The blog series will be based on a specific collection and its set of documents. Each document represents a BSON type and contains a property with one type and additional ‘helper’ properties. This will serve as the basis for query execution. Each document contains three properties:

  • “x”: This property contains a different data value in each document
  • “comment”: this property describes “x” as originally specified. When the document is queried, it is clear what the original specification of “x” was. This is necessary as the node.js driver modifies documents on insert.
  • “btype”: this property contains the BSON type designator. When the document is queried, the type is clear without having to query it explicitly.

The next blog in this series will discuss this collection of documents in detail.

Where do we go from here?

The various parts of the blog series lead us around into may corners of the JavaScript data types and how they are mapped to MongoDB (and vice versa: how the BSON types are mapped (back) to JavaScript).

The goal is to discuss all aspects; and the blogs in the series are not static, meaning, if changes or more insights are coming up, the blogs will actually be updated.

So, let’s go.