MongoDB and node.js (Part 2): Insertion Log Examination

One important first step is to observe the documents as they get written into MongoDB when storing the test documents from the previous post (https://realprogrammer.wordpress.com/2013/02/10/mongodb-and-node-js-part-1-list-of-documents-for-all-types/). This establishes the base approach of transferring documents defined in JavaScript to documents required as BSON structures.

Type Systems Impedance Mismatch

There is a mismatch in type systems: JavaScript and BSON have some types in common, but not all. Because of this impedance mismatch there must be a mechanism that overcomes it. The following listing discusses those.

Insertion Log Examination

First of all, the JavaScript code of the previous blog drops the target collection and the logging states the outcome:

Drop Collection Result: true

Next, for each document in the previous blog, we show here what the document looks like when written to MongoDB and add a discussion as needed. This can be very boring to read as it is an account of an actual execution; however, please search for specific types if you are only interested in those.

BSON Double

Definition:

{"x": new MongoDB.Double(123.123),
 "comment": "new MongoDB.Double(123.123)",
 "btype": 1}

Written As:

[ { x: { _bsontype: 'Double', value: 123.123 },
    comment: 'new MongoDB.Double(123.123)',
    btype: 1,
    _id: 512675b5508942d427000001 } ]

Comment: There are a few items that require discussion. First, a property “_id” was added by MongoDB. This is the case for all subsequent documents also.

Second, the constructor call “new MongoDB.Double()” was translated into an object that has a property “_bsontype” and a “value”. The “_bsontype” contains the textual specification of the BSON type at hand, and the value contains the value as it was provided to the constructor. This shows that there is no direct translation from a BSON Double to a JavaScript type.

JavaScript Number

Definition:

{"x": 456.456,
 "comment": "456.456",
 "btype": 1}

Written As:

[ { x: 456.456,
    comment: '456.456',
    btype: 1,
    _id: 512675b5508942d427000002 } ]

Comment: This is a direct translation from JavaScript to BSON as no constructor was involved.

JavaScript String

Definition:

{"x": "abc",
 "comment": "abc",
 "btype": 2}

Written As:

[ { x: 'abc',
    comment: 'abc',
    btype: 2,
    _id: 512675b5508942d427000003 } ]

Comment: This is also a direct translation from JavaScript to BSON.

JavaScript Object

Definition:

{"x": {"z": 5},
 "comment": "{\"z\": 5}",
 "btype": 3}

Written As:

[ { x: { z: 5 },
    comment: '{"z": 5}',
    btype: 3,
    _id: 512675b5508942d427000004 } ]

Comment: Objects in JavaScript are directly translated into BSON.

JavaScript Array

Definition:

{"x": [9, 8, 7],
 "comment": "[9, 8, 7]",
 "btype": 16}

Written As:

[ { x: [ 9, 8, 7 ],
    comment: '[9, 8, 7]',
    btype: 16,
    _id: 512675b5508942d427000005 } ]

Comment: Arrays are translated directly into BSON also.

Definition:

{"x": [
        {"y": 4},
        {"z": 5}
      ], 
 "comment": "[{\"y\": 4}, {\"z\": 5}]",
 "btype": 3}

Written As:

[ { x: [ [Object], [Object] ],
    comment: '[{"y": 4}, {"z": 5}]',
    btype: 3,
    _id: 512675b5508942d427000006 } ]

Comment: Again, an array directly translated.

BSON Binary

Definition:

{"x": new MongoDB.Binary("binary"),
 "comment": "new MongoDB.Binary(\"binary\")",
 "btype": 5}

Written As:

[ { x:
    { _bsontype: 'Binary',
      sub_type: 0,
      position: 6,
      buffer: <Buffer 62 69 6e 61 72 79> },
    comment: 'new MongoDB.Binary("binary")',
    btype: 5,
    _id: 512675b5508942d427000007 } ]

Comment: Binary data types are created with a constructor before being passed on to MongoDB.

BSON ObjectId

Definition:

{"x": new MongoDB.ObjectID("5040dc5d40b67c681d000001"),
 "comment": "new MongoDB.ObjectID(\"5040dc5d40b67c681d000001\")",
 "btype": 7}

Written As:

[ { x: 5040dc5d40b67c681d000001,
    comment: 'new MongoDB.ObjectID("5040dc5d40b67c681d000001")',
    btype: 7,
    _id: 512675b5508942d427000008 } ]

Comment: Even though a constructor is given, the object written is not marked with “_bsontype”.

JavaScript Boolean

Definition:

{"x": false,
 "comment": "false",
 "btype": 8}

Written As:

[ { x: false,
    comment: 'false',
    btype: 8,
    _id: 512675b5508942d427000009 } ]

Comment: The Boolean gets translated directly.

Definition:

{"x": true,
 "comment": "true",
 "btype": 8}

Written As:

[ { x: true,
    comment: 'true',
    btype: 8,
    _id: 512675b5508942d42700000a } ]

Comment: The Boolean gets translated directly.

JavaScript Date

Definition:

{"x": new Date("2012-08-31 12:13:14:156 UTC"),
 "comment": "new Date(\"2012-08-31 12:13:14:156 UTC\")",
 "btype": 9}

Written As:

[ { x: Fri Aug 31 2012 05:13:14 GMT-0700 (Pacific Daylight Time),
    comment: 'new Date("2012-08-31 12:13:14:156 UTC")',
    btype: 9,
    _id: 512675b5508942d42700000b } ]

Comment: Written as Date-formatted text.

JavaScript Null

Definition:

{"x": null,
 "comment": "null",
 "btype": 10}

Written As:

[ { x: null,
    comment: 'null',
    btype: 10,
    _id: 512675b5508942d42700000c } ]

Comment: “null” is directly written.

JavaScript Regular Expression

Definition:

{"x": new RegExp("abc"),
 "comment": "new RegExp(\"abc\")",
 "btype": 11}

Written As:

[ { x: /abc/,
    comment: 'new RegExp("abc")',
    btype: 11,
    _id: 512675b5508942d42700000d } ]

Comment: The regular expression is written as regular expression text.

Definition:

{"x": new RegExp("abc", "i"),
 "comment": "new RegExp(\"abc\", \"i\")",
 "btype": 11}

Written As:

[ { x: /abc/i,
    comment: 'new RegExp("abc", "i")',
    btype: 11,
    _id: 512675b5508942d42700000e } ]

Comment: The regular expression is written as regular expression text.

BSON DBRef

There are various ways of defining a BSON DBRef. All possibilities are discussed next.

Definition:

{"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types"),
 "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\", \"types\")",
 "btype": 3}

Written As:

[ { x:
    { _bsontype: 'DBRef',
      namespace: 'types_node',
      oid: '5040dc5d40b67c681d000001',
      db: 'types' },
    comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types")',
    btype: 3,
    _id: 512675b5508942d42700000f } ]

Comment: In this case the constructor creates a complete object, including “_bsontype”.

Definition:

{"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001"),
 "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\")",
 "btype": 3}

Written As:

[ { x:
    { _bsontype: 'DBRef',
      namespace: 'types_node',
      oid: '5040dc5d40b67c681d000001',
      db: undefined },
    comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001")',
    btype: 3,
    _id: 512675b5508942d427000010 } ]

Comment: This is like the case before, except, the database is not defined.

Definition:

{"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"},
 "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\", \"$db\": \"types\"}",
 "btype": 3}

Written As:

[ { x: { '$ref': 'types_node', '$id': '5040dc5d40b67c681d000001' },
    comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"}',
    btype: 3,
    _id: 512675b5508942d427000012 } ]

Comment: This is an alternative way to create a DBRef object in MongoDB. Instead of using the constructor, a document can be directly defined, however, the property names have to match precisely what MongoDB is expecting.

Definition:

{"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"},
 "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\"}",
 "btype": 3}

Written As:

[ { x:
    { '$ref': 'types_node', '$id': '5040dc5d40b67c681d000001', '$db': 'types' },
    comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"}',
    btype: 3,
    _id: 512675b5508942d427000011 } ]

Comment: Like before, this is a direct way to store a DBRef, this time with a database specification.

BSON Code

BSON code comes in two forms, with and without scope. Both are discussed next.

Definition:

{"x": new MongoDB.Code("function () {}"),
 "comment": "new MongoDB.Code(\"function () {}\")",
 "btype": 13}

Written As:

[ { x: { _bsontype: 'Code', code: 'function () {}', scope: {} },
    comment: 'new MongoDB.Code("function () {}")',
    btype: 13,
    _id: 512675b5508942d427000013 } ]

Comment: Code is created as an object with a constructor, hence the “_bsontype” property.

Definition:

{"x": new MongoDB.Code("function (a) {}", {"a": 4}),
 "comment": "new MongoDB.Code(\"function (a) {}\", {\"a\": 4})",
 "btype": 15}

Written As:

[ { x: { _bsontype: 'Code', code: 'function (a) {}', scope: [Object] },
    comment: 'new MongoDB.Code("function (a) {}", {"a": 4})',
    btype: 15,
    _id: 512675b5508942d427000015 } ]

Comment: This case is basically the same as the code without scope.

BSON Symbol

Definition:

{"x": new MongoDB.Symbol("def15"),
 "comment": "new MongoDB.Symbol(\"def15\")",
 "btype": 14}

Written As:

[ { x: def15,
    comment: 'new MongoDB.Symbol("def15")',
    btype: 14,
    _id: 512675b5508942d427000014 } ]

Comment: BSON Symbol is created as object by a constructor. However, it does not have a “_bsontype” as the value of a symbol is converted into a string.

BSON 32-Bit Int

Definition:

{"x": 123456,
 "comment": "123456",
 "btype": 16}

Written As:

[ { x: 123456,
    comment: '123456',
    btype: 16,
    _id: 512675b5508942d427000016 } ]

Comment: A BSON 32-bit int does not require a constructor and maps directly to the corresponding JSON type.

BSON Timestamp

Definition:

{"x": new MongoDB.Timestamp(1, 2),
 "comment": "new MongoDB.Timestamp(1, 2)",
 "btype": 17}

Written As:

[ { x: { _bsontype: 'Timestamp', low_: 1, high_: 2 },
    comment: 'new MongoDB.Timestamp(1, 2)',
    btype: 17,
    _id: 512675b5508942d427000017 } ]

Comment: A BSON Timestamp is created by a constructor as an object. It hence has the “_bsontype” property.

BSON Long

Definition:

{"x": new MongoDB.Long("987"),
 "comment": "new MongoDB.Long(\"987\")",
 "btype": 18}

Written As:

[ { x: { _bsontype: 'Long', low_: 987, high_: 0 },
    comment: 'new MongoDB.Long("987")',
    btype: 18,
    _id: 512675b5508942d427000018 } ]

Comment: The BSON Long type is created as object by a constructor and has the “_bsontype” property.

BSON MinKey and MaxKey

Definition:

{"x": new MongoDB.MinKey(),
 "comment": "MongoDB.MinKey()",
 "btype": 255}
{"x": new MongoDB.MaxKey(),
 "comment": "MongoDB.MaxKey()",
 "btype": 127}

Written As:

[ { x: { _bsontype: 'MinKey' },
    comment: 'MongoDB.MinKey()',
    btype: 255,
    _id: 512675b5508942d427000019 } ]
[ { x: { _bsontype: 'MaxKey' },
    comment: 'MongoDB.MaxKey()',
    btype: 127,
    _id: 512675b5508942d42700001a } ]

Comment: Both, BSON MinKey and MaxKey are created by a constructor as object and have the property “_bsontype”.

JavaScript undefined

Definition:

 {"x": undefined,
  "comment": "undefined",
  "btype": 10}

Written As:

[ { x: undefined,
    comment: 'undefined',
    btype: 10,
    _id: 512675b5508942d42700001b } ]

Comment: The JavaScript ‘undefined’ is directly stored.

JavaScript Number.NaN

Definition:

{"x": Number.NaN,
 "comment": "Number.NaN",
 "btype": 1}

Written As:

[ { x: NaN,
    comment: 'Number.NaN',
    btype: 1,
    _id: 512675b5508942d42700001c } ]

Comment: JavaScript Number.NaN is written as NaN.

JavaScript Infinity, Number.POSITIVE_INFINITY and Number.NEGATIVE_INFINITY

Definition:

{"x": Infinity,
 "comment": "Infinity",
 "btype": 1}
{"x": Number.POSITIVE_INFINITY,
 "comment": "Number.POSITIVE_INFINITY",
 "btype": 1}
{"x": Number.NEGATIVE_INFINITY,
 "comment": "Number.NEGATIVE_INFINITY",
 "btype": 1}

Written As:

[ { x: Infinity,
    comment: 'Infinity',
    btype: 1,
    _id: 512675b5508942d42700001d } ]
[ { x: Infinity,
    comment: 'Number.POSITIVE_INFINITY',
    btype: 1,
    _id: 512675b5508942d42700001e } ]
[ { x: -Infinity,
    comment: 'Number.NEGATIVE_INFINITY',
    btype: 1,
    _id: 512675b5508942d42700001f } ]

Comment: JavaScript Infinity is stored as such. Number.POSITIVE_INFINITY is stored as Infinity and Number.NEGATIVE_INFINITY is stored as -Infinity.

JavaScript Number.MIN_VALUE and Number.MAX_VALUE

Definition:

{"x": Number.MIN_VALUE,
 "comment": "Number.MIN_VALUE",
 "btype": 1}
{"x": Number.MAX_VALUE,
 "comment": "Number.MAX_VALUE",
 "btype": 1}

Written As:

[ { x: 5e-324,
    comment: 'Number.MIN_VALUE',
    btype: 1,
    _id: 512675b5508942d427000020 } ]
[ { x: 1.7976931348623157e+308,
    comment: 'Number.MAX_VALUE',
    btype: 1,
    _id: 512675b5508942d427000021 } ]

Comment: Number.MIN_VALUE and Number.MAX_VALUE are stored as MIN_VALUE and MAX_VALUE respectively.

Summary

This was a very ‘dry’ blog. Nevertheless, when dealing with a database interface, it is important to ensure the proper understanding of how datatypes in the involved type systems map to each other.

It will become extremely relevant when data stored is queried back. A few surprises are waiting for you.

Advertisement

MongoDB and node.js (Part 1): List of Documents for all Types

This blog contains the document set that will be used throughout the blog series.

Document Set Characteristics

The data set is a collection of documents, whereby each document contains three properties:

  • “x”: This property contains a different data value in each document
  • “comment”: this property describes “x” as originally specified. When the document is queried, it is clear what the original specification of “x” was. This is necessary as the node.js driver modifies documents on insert.
  • “btype”: this property contains the BSON type designator. When the document is queried, the type is clear without having to query it explicitly.

Since the blog series is evolving, the set of documents might change as needed in order to point out more specifics; the current set is a good starting point, however.

If there is more than one way to insert a specific type, several documents are included, one document for each possibility. This is to show the possible alternatives exhaustively.

JavaScript, BSON, JSON

Node.js is is implementing server-side JavaScript (based on Google’s V8 Engine). So the data types that are passed back and forth to the MongoDB Node.js driver are JavaScript types.

MongoDB internally operates on BSON. This means that on the way from the Node.js driver to MongoDB JavaScript types are transformed into BSON (and vice versa on the way back). For some BSON types MongoDB provides constructors.

While many equate JavaScript data types structures with JSON, actually JSON is not an equivalent serialization of JavaScript types (more on that issue during the blog series).

Based on this discussion, the test data document set tries to cover all JavaScript types and their variation, plus the JavaScript implementation of BSON types. This will also clarify more over the course of the blog series.

Document Set

The following code is complete in the sense that it runs and inserts the documents into a MongoDB database called “nodeTest”, using all the defaults.

/*global require*/

var MongoDB = require('mongodb');

/*
 Type codes
 ==========
 1 "\x01" e_name double             Floating point
 2 "\x02" e_name string             UTF-8 string
 3 "\x03" e_name document           Embedded document
 4 "\x04" e_name document           Array
 5 "\x05" e_name binary             Binary data

 7 "\x07" e_name (byte*12)          ObjectId
 8 "\x08" e_name "\x00"             Boolean "false"
 8 "\x08" e_name "\x01"             Boolean "true"
 9 "\x09" e_name int64              UTC datetime
 10 "\x0A" e_name Null value
 11 "\x0B" e_name cstring cstring   Regular expression

 13 "\x0D" e_name string            JavaScript code

 15 "\x0F" e_name code_w_s          JavaScript code w/ scope
 16 "\x10" e_name int32             32-bit Integer
 17 "\x11" e_name int64             Timestamp
 18 "\x12" e_name int64             64-bit integer
 255 "\xFF" e_name Min key
 127 "\x7F" e_name Max key

 Deprecated type codes
 =====================
 6 "\x06" e_name                    Undefined — Deprecated
 12 "\x0C" e_name string (byte*12)  DBPointer — Deprecated
 14 "\x0E" e_name string            Symbol — Deprecated

 */

var typeDocuments;

typeDocuments = [
    {"x": new MongoDB.Double(123.123),
        "comment": "new MongoDB.Double(123.123)",
        "btype": 1},
    {"x": 456.456,
        "comment": "456.456",
        "btype": 1},
    {"x": "abc",
        "comment": "abc",
        "btype": 2},
    {"x": {"z": 5},
        "comment": "{\"z\": 5}",
        "btype": 3},
    // this is not type:4
    {"x": [9, 8, 7],
        "comment": "[9, 8, 7]",
        "btype": 16},
    {"x": [
        {"y": 4},
        {"z": 5}
    ], "comment": "[{\"y\": 4}, {\"z\": 5}]",
        "btype": 3},
    {"x": new MongoDB.Binary("binary"),
        "comment": "new MongoDB.Binary(\"binary\")",
        "btype": 5},
    // t:6 deprecated (was 'undefined') - not implemented
    {"x": new MongoDB.ObjectID("5040dc5d40b67c681d000001"),
        "comment": "new MongoDB.ObjectID(\"5040dc5d40b67c681d000001\")",
        "btype": 7},
    {"x": false,
        "comment": "false",
        "btype": 8},
    {"x": true,
        "comment": "true",
        "btype": 8},
    {"x": new Date("2012-08-31 12:13:14:156 UTC"),
        "comment": "new Date(\"2012-08-31 12:13:14:156 UTC\")",
        "btype": 9},
    {"x": null,
        "comment": "null",
        "btype": 10},
    {"x": new RegExp("abc"),
        "comment": "new RegExp(\"abc\")",
        "btype": 11},
    {"x": new RegExp("abc", "i"),
        "comment": "new RegExp(\"abc\", \"i\")",
        "btype": 11},
    // t:12 DBRef deprecated - still implemented
    // this is not type:12
    {"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types"),
        "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\", \"types\")",
        "btype": 3},
    // this is not type:12
    {"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001"),
        "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\")",
        "btype": 3},
    // MongoDB defined JSON serialization (http://docs.mongodb.org/manual/reference/mongodb-extended-json/)
    // this is not type:12
    {"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"},
        "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\", \"$db\": \"types\"}",
        "btype": 3},
    // this is not type:12
    {"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"},
        "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\"}",
        "btype": 3},
    {"x": new MongoDB.Code("function () {}"),
        "comment": "new MongoDB.Code(\"function () {}\")",
        "btype": 13},
    // t:14 Symbol deprecated - still implemented
    {"x": new MongoDB.Symbol("def15"),
        "comment": "new MongoDB.Symbol(\"def15\")",
        "btype": 14},
    {"x": new MongoDB.Code("function (a) {}", {"a": 4}),
        "comment": " new MongoDB.Code(\"function (a) {}\", {\"a\": 4})",
        "btype": 15},
    {"x": 123456,
        "comment": "123456",
        "btype": 16},
    {"x": new MongoDB.Timestamp(1, 2),
        "comment": "new MongoDB.Timestamp(1, 2)",
        "btype": 17},
    {"x": new MongoDB.Long("987"),
        "comment": "new MongoDB.Long(\"987\")",
        "btype": 18},
    {"x": new MongoDB.MinKey(),
        "comment": "new MongoDB.MinKey()",
        "btype": 255},
    {"x": new MongoDB.MaxKey(),
        "comment": "new MongoDB.MaxKey()",
        "btype": 127},
    // ADDITIONAL POSSIBLE VALUES
    // 'undefined' will be converted to 'null'; type will be 'null' (aka 10) also
    {"x": undefined,
        "comment": "undefined",
        "btype": 10},
    {"x": Number.NaN,
        "comment": "Number.NaN",
        "btype": 1},
    {"x": Infinity,
        "comment": "Infinity",
        "btype": 1},
    {"x": Number.POSITIVE_INFINITY,
        "comment": "Number.POSITIVE_INFINITY",
        "btype": 1},
    {"x": Number.NEGATIVE_INFINITY,
        "comment": "Number.NEGATIVE_INFINITY",
        "btype": 1},
    {"x": Number.MIN_VALUE,
        "comment": "MIN_VALUE",
        "btype": 1},
    {"x": Number.MAX_VALUE,
        "comment": "MAX_VALUE",
        "btype": 1}
];

var Db = MongoDB.Db,
    Server = MongoDB.Server;
var db = new Db('nodeTest', new Server("127.0.0.1", 27017,
    {auto_reconnect: false, poolSize: 4}), {native_parser: false, safe: false});

db.open(function (err, db) {
    "use strict";
    db.dropCollection("types_node", function (err, result) {
        var i,
            printLog = function (err, result) {
                if (err) {
                    console.log(err.toString());
                }
                console.log(result);
            };
        if (err) {
            console.log(err.toString());
        }
        console.log("Drop Collection Result: " + result);
        db.collection("types_node", function (err, collection) {
            for (i = 0; i < typeDocuments.length; i = i + 1) {
                collection.insert(typeDocuments[i], {safe: true}, printLog);
            }
        });
    });
});

MongoDB and node.js (Part 0): Rationalization of Data Types

This is the start of a blog series in context of MongoDB and node.js. It is a rationalization of data type handling and transfer between JavaScript (node.js) and MongoDB. The series is very specific to MongoDB and the 10gen supported MongoDB node.js driver.

How did the opportunity for this blog series arise? Basically, my plan was to do a round-trip from JavaScript to MongoDB and back for each data type and constant in JavaScript.

While an easy plan to come up with, executing it turned out to be a little journey. During that journey a few issues arose that I did not expect and the blog series will report on those. Along the way a few JIRAs and GitHub issues were filed and will be reported on in the appropriate places.

Philosophy

I had and will have a clear philosophy in mind throughout this journey:

  • Treating MongoDB as a database. What I mean by that is that if I can store data into MongoDB, I expect that I can query the data and retrieve it again. Especially, when storing a document, I expect to be able to query this document again.
  • Foreign Database ownership. This means that I do not assume to have control over the database and its contents, but assume that a database is given to me. However some user managed to put documents into the database, I should be able to retrieve it.
  • Conventions are Conventions. Often I hear: ‘don’t do that’; or, ‘this is an internal type, don’t use it’. While I as a human might follow this advice, the software (aka MongoDB) might or might now know about it. So if it does not enforce constraints at its interface, then I assume there is no technical need for these constraints in order for MongoDB to work properly.

This philosophy might sound obvious in a database context, but along the steps of the journey I’ll refer back to it at various places as it makes a difference.

Starting Point: Creating documents for all JavaScript Types

The blog series will be based on a specific collection and its set of documents. Each document represents a BSON type and contains a property with one type and additional ‘helper’ properties. This will serve as the basis for query execution. Each document contains three properties:

  • “x”: This property contains a different data value in each document
  • “comment”: this property describes “x” as originally specified. When the document is queried, it is clear what the original specification of “x” was. This is necessary as the node.js driver modifies documents on insert.
  • “btype”: this property contains the BSON type designator. When the document is queried, the type is clear without having to query it explicitly.

The next blog in this series will discuss this collection of documents in detail.

Where do we go from here?

The various parts of the blog series lead us around into may corners of the JavaScript data types and how they are mapped to MongoDB (and vice versa: how the BSON types are mapped (back) to JavaScript).

The goal is to discuss all aspects; and the blogs in the series are not static, meaning, if changes or more insights are coming up, the blogs will actually be updated.

So, let’s go.

Null, Undefined, NaN and Missing Property: Goto Considered Harmful (Part 2)

Wait a minute: ‘undefined’ and ‘NaN’ is not legal JSON! Part 1 of this blog must have made a mistake?

MongoDB, JSON and Javascript

The MongoDB shell operates on Javascript data types, not JSON data types. Therefore, this is legal and working (MongoDB 2.2.0):

> use blog
switched to db blog
> db.blogColl.save({"amount":25})
> db.blogColl.save({"amount":null})
> db.blogColl.save({"amount":undefined})
> db.blogColl.save({"amount":NaN})
> db.blogColl.save({"balance":33})
>

So, using Javascript constants like ‘null’, ‘undefined’ and ‘NaN’ is possible and legal in context of the MongoDB shell.

MongoDB: Transformation of Input

As discussed elsewhere, MongoDB does transform data types and values at the point of insertion. When retrieving the above, the following result is displayed:

> db.blogColl.find()
{ "_id" : ObjectId("50955b2eeb749e2e4a36d779"), "amount" : 25 }
{ "_id" : ObjectId("50955b38eb749e2e4a36d77a"), "amount" : null }
{ "_id" : ObjectId("50955b3feb749e2e4a36d77b"), "amount" : null }
{ "_id" : ObjectId("50955b47eb749e2e4a36d77c"), "amount" : NaN }
{ "_id" : ObjectId("50955b4eeb749e2e4a36d77d"), "balance" : 33 }

MongoDB decided to honor the Javascript ‘null’ and ‘NaN’, but ‘undefined’ is changed to ‘null’. From a data modeling perspective this is significant as there is no difference in meaning between ‘undefined’ and ‘null’ in the context of MongoDB. An application having a distinct interpretation of those values will cause errors due to the distinction in interpretation it makes, but not MongoDB.

MongoDB: Aggregation Results

Coming back to Part 1 now, the question is, how would the default aggregation operators of MongoDB interpret the above data set in MongoDB? The following is an aggregation framework query:

db.blogColl.aggregate(
    { $group : {
        _id : 1,
        noDocs: { $sum : 1},
        totalamount: { $sum : "$amount" },
        averageamount: { $avg : "$amount" },
        minamount: { $min : "$amount"},
        maxamount: { $max : "$amount"},
        totalbalance: { $sum : "$balance"},
        averagebalance: { $avg : "$balance" },
        minbalance: { $min : "$balance"},
        maxbalance: { $max : "$balance"}
    }}
);

The ‘group’ pipeline step ensures that all documents are seen as one group for the purpose of aggregation. The aggregation operators are applied to all documents in that group. In this example, to all documents.

The documents have either ‘amount’ or ‘balance’ as properties and for both some aggregate values are computed. The outcome when executing the above is:

{
        "result" : [
                {
                        "_id" : 1,
                        "noDocs" : 5,
                        "totalamount" : NaN,
                        "averageamount" : NaN,
                        "maxamount" : 25,
                        "totalbalance" : 33,
                        "averagebalance" : 33,
                        "maxbalance" : 33
                }
        ],
        "ok" : 1
}

There are a few things to call out:

  • The minimum aggregation operator was not successful and the result was left out (the reason is unclear to me).
  • The maximum operator ignored ‘null’ and ‘NaN’ and produced the correct values.
  • The sum and average operators do not ignore ‘null’ and ‘NaN’; so if a non-number is present, the operators indicate that by stating that there was at least one entry that is not a number.

Implications to Modeling Considerations

Part 1 of this blog highlighted that fact that there needs to be a common understanding on dynamic schemas as well as the interpretation of property values, especially constants and (aggregation) operations on those.

Unless your interpretation precisely matches the interpretation of MongoDB, you will have to implement those operators and their interpretation outside MongoDB or in a different way with the capabilities that MongoDB provides.

The above example shows that this is an important design discussion that must take place before system construction in order to avoid bigger problems down the road.

Computing Technology Inflection Point: Happening Right Now

I believe that computing in industry is at a computing technology inflection point right now based on some supporting observations:

  • Databases. Through the NoSQL movement in the database world a shift is taking place:
    1. New database management systems are being created, made available and put into production
    2. The new database systems are by enlarge schema-less and/or support dynamic schema changes
    3. Many support JSON as data structures or data model that is inherently supported by Javascript
  • Server. The Javascript language is enabled as server-side programming language:
    1. Google’s V8-Engine together with Node.js and its Eco-system makes Javascript a real contender as a server-side language
    2. Javascript does not have a class/instance model and supports schema-less programming, including dynamic schema changes
    3. Integrated Development Environments (IDEs) provide full Javascript support
  • User Interface. The combination of HTML5/Javascript is gaining traction:
    1. HTML5 in conjunction with Javascript is a real powerful web development technology set
    2. Javascript libraries are being built that can span all types of user interfaces and user devices (including mobile)
    3. It is possible to have a streaming query from the database all the way to the user interface; its frictionless

Combining all these observations, two major characteristics of a new computing technology stack are crystallizing very clearly:

  • One cross-layer programming language (Multi-tier Programming)

    • It is possible to use the same language in the User Interface, Server and Database, i.e., across all architectural layers
  • Inherent schema-less computing and dynamic schema change support
    • All layers shift to schema-less technology and models

I believe that this is a real inflection point happening right now and will cause a major shift in system design, engineering and evolution. It leaves behind the notion that each layer in a typical technology stack must have its specialized language. It also leaves behind the notion that every concept across domains should be best represented in the Class-Instance paradigm. It opens up the ability to dynamically evolve within the systems’ schema-less or dynamic schema change capabilities and it supports the reuse of logic and types across all layers without loosing or changing semantics.

If this combination of technology continues to gain major traction, I would not be surprised to see that some of the first operating system prototypes written in Javascript will become mainstream at some point in time. And, of course, it would be interesting to see how far the processor manufacturers are in their thought processes or research projects with putting dedicated Javascript execution functionality on the processors themselves.

Being around at this computing technology inflection points is exciting and professionally lays before us interesting choices in business decisions as well as career decisions.