MongoDB and node.js (Part 2): Insertion Log Examination

One important first step is to observe the documents as they get written into MongoDB when storing the test documents from the previous post (https://realprogrammer.wordpress.com/2013/02/10/mongodb-and-node-js-part-1-list-of-documents-for-all-types/). This establishes the base approach of transferring documents defined in JavaScript to documents required as BSON structures.

Type Systems Impedance Mismatch

There is a mismatch in type systems: JavaScript and BSON have some types in common, but not all. Because of this impedance mismatch there must be a mechanism that overcomes it. The following listing discusses those.

Insertion Log Examination

First of all, the JavaScript code of the previous blog drops the target collection and the logging states the outcome:

Drop Collection Result: true

Next, for each document in the previous blog, we show here what the document looks like when written to MongoDB and add a discussion as needed. This can be very boring to read as it is an account of an actual execution; however, please search for specific types if you are only interested in those.

BSON Double

Definition:

{"x": new MongoDB.Double(123.123),
 "comment": "new MongoDB.Double(123.123)",
 "btype": 1}

Written As:

[ { x: { _bsontype: 'Double', value: 123.123 },
    comment: 'new MongoDB.Double(123.123)',
    btype: 1,
    _id: 512675b5508942d427000001 } ]

Comment: There are a few items that require discussion. First, a property “_id” was added by MongoDB. This is the case for all subsequent documents also.

Second, the constructor call “new MongoDB.Double()” was translated into an object that has a property “_bsontype” and a “value”. The “_bsontype” contains the textual specification of the BSON type at hand, and the value contains the value as it was provided to the constructor. This shows that there is no direct translation from a BSON Double to a JavaScript type.

JavaScript Number

Definition:

{"x": 456.456,
 "comment": "456.456",
 "btype": 1}

Written As:

[ { x: 456.456,
    comment: '456.456',
    btype: 1,
    _id: 512675b5508942d427000002 } ]

Comment: This is a direct translation from JavaScript to BSON as no constructor was involved.

JavaScript String

Definition:

{"x": "abc",
 "comment": "abc",
 "btype": 2}

Written As:

[ { x: 'abc',
    comment: 'abc',
    btype: 2,
    _id: 512675b5508942d427000003 } ]

Comment: This is also a direct translation from JavaScript to BSON.

JavaScript Object

Definition:

{"x": {"z": 5},
 "comment": "{\"z\": 5}",
 "btype": 3}

Written As:

[ { x: { z: 5 },
    comment: '{"z": 5}',
    btype: 3,
    _id: 512675b5508942d427000004 } ]

Comment: Objects in JavaScript are directly translated into BSON.

JavaScript Array

Definition:

{"x": [9, 8, 7],
 "comment": "[9, 8, 7]",
 "btype": 16}

Written As:

[ { x: [ 9, 8, 7 ],
    comment: '[9, 8, 7]',
    btype: 16,
    _id: 512675b5508942d427000005 } ]

Comment: Arrays are translated directly into BSON also.

Definition:

{"x": [
        {"y": 4},
        {"z": 5}
      ], 
 "comment": "[{\"y\": 4}, {\"z\": 5}]",
 "btype": 3}

Written As:

[ { x: [ [Object], [Object] ],
    comment: '[{"y": 4}, {"z": 5}]',
    btype: 3,
    _id: 512675b5508942d427000006 } ]

Comment: Again, an array directly translated.

BSON Binary

Definition:

{"x": new MongoDB.Binary("binary"),
 "comment": "new MongoDB.Binary(\"binary\")",
 "btype": 5}

Written As:

[ { x:
    { _bsontype: 'Binary',
      sub_type: 0,
      position: 6,
      buffer: <Buffer 62 69 6e 61 72 79> },
    comment: 'new MongoDB.Binary("binary")',
    btype: 5,
    _id: 512675b5508942d427000007 } ]

Comment: Binary data types are created with a constructor before being passed on to MongoDB.

BSON ObjectId

Definition:

{"x": new MongoDB.ObjectID("5040dc5d40b67c681d000001"),
 "comment": "new MongoDB.ObjectID(\"5040dc5d40b67c681d000001\")",
 "btype": 7}

Written As:

[ { x: 5040dc5d40b67c681d000001,
    comment: 'new MongoDB.ObjectID("5040dc5d40b67c681d000001")',
    btype: 7,
    _id: 512675b5508942d427000008 } ]

Comment: Even though a constructor is given, the object written is not marked with “_bsontype”.

JavaScript Boolean

Definition:

{"x": false,
 "comment": "false",
 "btype": 8}

Written As:

[ { x: false,
    comment: 'false',
    btype: 8,
    _id: 512675b5508942d427000009 } ]

Comment: The Boolean gets translated directly.

Definition:

{"x": true,
 "comment": "true",
 "btype": 8}

Written As:

[ { x: true,
    comment: 'true',
    btype: 8,
    _id: 512675b5508942d42700000a } ]

Comment: The Boolean gets translated directly.

JavaScript Date

Definition:

{"x": new Date("2012-08-31 12:13:14:156 UTC"),
 "comment": "new Date(\"2012-08-31 12:13:14:156 UTC\")",
 "btype": 9}

Written As:

[ { x: Fri Aug 31 2012 05:13:14 GMT-0700 (Pacific Daylight Time),
    comment: 'new Date("2012-08-31 12:13:14:156 UTC")',
    btype: 9,
    _id: 512675b5508942d42700000b } ]

Comment: Written as Date-formatted text.

JavaScript Null

Definition:

{"x": null,
 "comment": "null",
 "btype": 10}

Written As:

[ { x: null,
    comment: 'null',
    btype: 10,
    _id: 512675b5508942d42700000c } ]

Comment: “null” is directly written.

JavaScript Regular Expression

Definition:

{"x": new RegExp("abc"),
 "comment": "new RegExp(\"abc\")",
 "btype": 11}

Written As:

[ { x: /abc/,
    comment: 'new RegExp("abc")',
    btype: 11,
    _id: 512675b5508942d42700000d } ]

Comment: The regular expression is written as regular expression text.

Definition:

{"x": new RegExp("abc", "i"),
 "comment": "new RegExp(\"abc\", \"i\")",
 "btype": 11}

Written As:

[ { x: /abc/i,
    comment: 'new RegExp("abc", "i")',
    btype: 11,
    _id: 512675b5508942d42700000e } ]

Comment: The regular expression is written as regular expression text.

BSON DBRef

There are various ways of defining a BSON DBRef. All possibilities are discussed next.

Definition:

{"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types"),
 "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\", \"types\")",
 "btype": 3}

Written As:

[ { x:
    { _bsontype: 'DBRef',
      namespace: 'types_node',
      oid: '5040dc5d40b67c681d000001',
      db: 'types' },
    comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types")',
    btype: 3,
    _id: 512675b5508942d42700000f } ]

Comment: In this case the constructor creates a complete object, including “_bsontype”.

Definition:

{"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001"),
 "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\")",
 "btype": 3}

Written As:

[ { x:
    { _bsontype: 'DBRef',
      namespace: 'types_node',
      oid: '5040dc5d40b67c681d000001',
      db: undefined },
    comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001")',
    btype: 3,
    _id: 512675b5508942d427000010 } ]

Comment: This is like the case before, except, the database is not defined.

Definition:

{"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"},
 "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\", \"$db\": \"types\"}",
 "btype": 3}

Written As:

[ { x: { '$ref': 'types_node', '$id': '5040dc5d40b67c681d000001' },
    comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"}',
    btype: 3,
    _id: 512675b5508942d427000012 } ]

Comment: This is an alternative way to create a DBRef object in MongoDB. Instead of using the constructor, a document can be directly defined, however, the property names have to match precisely what MongoDB is expecting.

Definition:

{"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"},
 "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\"}",
 "btype": 3}

Written As:

[ { x:
    { '$ref': 'types_node', '$id': '5040dc5d40b67c681d000001', '$db': 'types' },
    comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"}',
    btype: 3,
    _id: 512675b5508942d427000011 } ]

Comment: Like before, this is a direct way to store a DBRef, this time with a database specification.

BSON Code

BSON code comes in two forms, with and without scope. Both are discussed next.

Definition:

{"x": new MongoDB.Code("function () {}"),
 "comment": "new MongoDB.Code(\"function () {}\")",
 "btype": 13}

Written As:

[ { x: { _bsontype: 'Code', code: 'function () {}', scope: {} },
    comment: 'new MongoDB.Code("function () {}")',
    btype: 13,
    _id: 512675b5508942d427000013 } ]

Comment: Code is created as an object with a constructor, hence the “_bsontype” property.

Definition:

{"x": new MongoDB.Code("function (a) {}", {"a": 4}),
 "comment": "new MongoDB.Code(\"function (a) {}\", {\"a\": 4})",
 "btype": 15}

Written As:

[ { x: { _bsontype: 'Code', code: 'function (a) {}', scope: [Object] },
    comment: 'new MongoDB.Code("function (a) {}", {"a": 4})',
    btype: 15,
    _id: 512675b5508942d427000015 } ]

Comment: This case is basically the same as the code without scope.

BSON Symbol

Definition:

{"x": new MongoDB.Symbol("def15"),
 "comment": "new MongoDB.Symbol(\"def15\")",
 "btype": 14}

Written As:

[ { x: def15,
    comment: 'new MongoDB.Symbol("def15")',
    btype: 14,
    _id: 512675b5508942d427000014 } ]

Comment: BSON Symbol is created as object by a constructor. However, it does not have a “_bsontype” as the value of a symbol is converted into a string.

BSON 32-Bit Int

Definition:

{"x": 123456,
 "comment": "123456",
 "btype": 16}

Written As:

[ { x: 123456,
    comment: '123456',
    btype: 16,
    _id: 512675b5508942d427000016 } ]

Comment: A BSON 32-bit int does not require a constructor and maps directly to the corresponding JSON type.

BSON Timestamp

Definition:

{"x": new MongoDB.Timestamp(1, 2),
 "comment": "new MongoDB.Timestamp(1, 2)",
 "btype": 17}

Written As:

[ { x: { _bsontype: 'Timestamp', low_: 1, high_: 2 },
    comment: 'new MongoDB.Timestamp(1, 2)',
    btype: 17,
    _id: 512675b5508942d427000017 } ]

Comment: A BSON Timestamp is created by a constructor as an object. It hence has the “_bsontype” property.

BSON Long

Definition:

{"x": new MongoDB.Long("987"),
 "comment": "new MongoDB.Long(\"987\")",
 "btype": 18}

Written As:

[ { x: { _bsontype: 'Long', low_: 987, high_: 0 },
    comment: 'new MongoDB.Long("987")',
    btype: 18,
    _id: 512675b5508942d427000018 } ]

Comment: The BSON Long type is created as object by a constructor and has the “_bsontype” property.

BSON MinKey and MaxKey

Definition:

{"x": new MongoDB.MinKey(),
 "comment": "MongoDB.MinKey()",
 "btype": 255}
{"x": new MongoDB.MaxKey(),
 "comment": "MongoDB.MaxKey()",
 "btype": 127}

Written As:

[ { x: { _bsontype: 'MinKey' },
    comment: 'MongoDB.MinKey()',
    btype: 255,
    _id: 512675b5508942d427000019 } ]
[ { x: { _bsontype: 'MaxKey' },
    comment: 'MongoDB.MaxKey()',
    btype: 127,
    _id: 512675b5508942d42700001a } ]

Comment: Both, BSON MinKey and MaxKey are created by a constructor as object and have the property “_bsontype”.

JavaScript undefined

Definition:

 {"x": undefined,
  "comment": "undefined",
  "btype": 10}

Written As:

[ { x: undefined,
    comment: 'undefined',
    btype: 10,
    _id: 512675b5508942d42700001b } ]

Comment: The JavaScript ‘undefined’ is directly stored.

JavaScript Number.NaN

Definition:

{"x": Number.NaN,
 "comment": "Number.NaN",
 "btype": 1}

Written As:

[ { x: NaN,
    comment: 'Number.NaN',
    btype: 1,
    _id: 512675b5508942d42700001c } ]

Comment: JavaScript Number.NaN is written as NaN.

JavaScript Infinity, Number.POSITIVE_INFINITY and Number.NEGATIVE_INFINITY

Definition:

{"x": Infinity,
 "comment": "Infinity",
 "btype": 1}
{"x": Number.POSITIVE_INFINITY,
 "comment": "Number.POSITIVE_INFINITY",
 "btype": 1}
{"x": Number.NEGATIVE_INFINITY,
 "comment": "Number.NEGATIVE_INFINITY",
 "btype": 1}

Written As:

[ { x: Infinity,
    comment: 'Infinity',
    btype: 1,
    _id: 512675b5508942d42700001d } ]
[ { x: Infinity,
    comment: 'Number.POSITIVE_INFINITY',
    btype: 1,
    _id: 512675b5508942d42700001e } ]
[ { x: -Infinity,
    comment: 'Number.NEGATIVE_INFINITY',
    btype: 1,
    _id: 512675b5508942d42700001f } ]

Comment: JavaScript Infinity is stored as such. Number.POSITIVE_INFINITY is stored as Infinity and Number.NEGATIVE_INFINITY is stored as -Infinity.

JavaScript Number.MIN_VALUE and Number.MAX_VALUE

Definition:

{"x": Number.MIN_VALUE,
 "comment": "Number.MIN_VALUE",
 "btype": 1}
{"x": Number.MAX_VALUE,
 "comment": "Number.MAX_VALUE",
 "btype": 1}

Written As:

[ { x: 5e-324,
    comment: 'Number.MIN_VALUE',
    btype: 1,
    _id: 512675b5508942d427000020 } ]
[ { x: 1.7976931348623157e+308,
    comment: 'Number.MAX_VALUE',
    btype: 1,
    _id: 512675b5508942d427000021 } ]

Comment: Number.MIN_VALUE and Number.MAX_VALUE are stored as MIN_VALUE and MAX_VALUE respectively.

Summary

This was a very ‘dry’ blog. Nevertheless, when dealing with a database interface, it is important to ensure the proper understanding of how datatypes in the involved type systems map to each other.

It will become extremely relevant when data stored is queried back. A few surprises are waiting for you.

Null, Undefined, NaN and Missing Property: Goto Considered Harmful (Part 3)

It turns out there is a lot more to discuss about MongoDB and its handling of basic and complex data types. That warrants a separate series of blogs altogether. Here is one more part in this three part series; a future new series will focus exclusively on MongoDB data types.

MongoDB and ‘undefined’

As discussed before, it is possible to store a document in MongoDB that has a property with value ‘undefined’. However, it turns out that MongoDB on inserting is changing ‘undefined’ to ‘null’. So a change in value is taking place implicitly. So while MongoDB accepts the value ‘undefined’ on its interface, storage-wise it is changed to ‘null’.

As a consequence, it is not possible to query for ‘undefined’. When querying for ‘null’ all the documents will be returned that were originally stored with the values ‘null’ as well as ‘undefined’ (as ‘undefined’ was changed to ‘null’).

Implication

The implicit change from ‘undefined’ to ‘null’ has a few important implications:

  • If an application makes a semantic distinction between ‘undefined’ and ‘null’, that distinction will not be kept when storing ‘undefined’ and ‘null’ as MongoDB will not keep those as separate values. Instead, the application must record the fact that a property value is ‘undefined’ in a different way (e.g., with an embedded document or a specific value that it interprets as ‘undefined’).
  • MongoDB applies implicit data value changes and that in specific cases the values being put in are not the values being returned when queried. ‘undefined’ is one of values that will be changed. Applications must be aware of this in order to decide if this is OK with them or not. If not, a different representation must be used by the application that does not fall into this category.

Conclusion

This 3-part mini-series started out looking at aggregation in the presence of various data types and brought up a few examples that show different interpretations when aggregating. These interpretations are essential when using MongoDB.

A side effect of the mini-series is that a larger discussion of data types and MongoDB opened up and is necessary to fully understand MongoDB’s behavior. A future blog series will address data types exclusively.

Null, Undefined, NaN and Missing Property: Goto Considered Harmful (Part 2)

Wait a minute: ‘undefined’ and ‘NaN’ is not legal JSON! Part 1 of this blog must have made a mistake?

MongoDB, JSON and Javascript

The MongoDB shell operates on Javascript data types, not JSON data types. Therefore, this is legal and working (MongoDB 2.2.0):

> use blog
switched to db blog
> db.blogColl.save({"amount":25})
> db.blogColl.save({"amount":null})
> db.blogColl.save({"amount":undefined})
> db.blogColl.save({"amount":NaN})
> db.blogColl.save({"balance":33})
>

So, using Javascript constants like ‘null’, ‘undefined’ and ‘NaN’ is possible and legal in context of the MongoDB shell.

MongoDB: Transformation of Input

As discussed elsewhere, MongoDB does transform data types and values at the point of insertion. When retrieving the above, the following result is displayed:

> db.blogColl.find()
{ "_id" : ObjectId("50955b2eeb749e2e4a36d779"), "amount" : 25 }
{ "_id" : ObjectId("50955b38eb749e2e4a36d77a"), "amount" : null }
{ "_id" : ObjectId("50955b3feb749e2e4a36d77b"), "amount" : null }
{ "_id" : ObjectId("50955b47eb749e2e4a36d77c"), "amount" : NaN }
{ "_id" : ObjectId("50955b4eeb749e2e4a36d77d"), "balance" : 33 }

MongoDB decided to honor the Javascript ‘null’ and ‘NaN’, but ‘undefined’ is changed to ‘null’. From a data modeling perspective this is significant as there is no difference in meaning between ‘undefined’ and ‘null’ in the context of MongoDB. An application having a distinct interpretation of those values will cause errors due to the distinction in interpretation it makes, but not MongoDB.

MongoDB: Aggregation Results

Coming back to Part 1 now, the question is, how would the default aggregation operators of MongoDB interpret the above data set in MongoDB? The following is an aggregation framework query:

db.blogColl.aggregate(
    { $group : {
        _id : 1,
        noDocs: { $sum : 1},
        totalamount: { $sum : "$amount" },
        averageamount: { $avg : "$amount" },
        minamount: { $min : "$amount"},
        maxamount: { $max : "$amount"},
        totalbalance: { $sum : "$balance"},
        averagebalance: { $avg : "$balance" },
        minbalance: { $min : "$balance"},
        maxbalance: { $max : "$balance"}
    }}
);

The ‘group’ pipeline step ensures that all documents are seen as one group for the purpose of aggregation. The aggregation operators are applied to all documents in that group. In this example, to all documents.

The documents have either ‘amount’ or ‘balance’ as properties and for both some aggregate values are computed. The outcome when executing the above is:

{
        "result" : [
                {
                        "_id" : 1,
                        "noDocs" : 5,
                        "totalamount" : NaN,
                        "averageamount" : NaN,
                        "maxamount" : 25,
                        "totalbalance" : 33,
                        "averagebalance" : 33,
                        "maxbalance" : 33
                }
        ],
        "ok" : 1
}

There are a few things to call out:

  • The minimum aggregation operator was not successful and the result was left out (the reason is unclear to me).
  • The maximum operator ignored ‘null’ and ‘NaN’ and produced the correct values.
  • The sum and average operators do not ignore ‘null’ and ‘NaN’; so if a non-number is present, the operators indicate that by stating that there was at least one entry that is not a number.

Implications to Modeling Considerations

Part 1 of this blog highlighted that fact that there needs to be a common understanding on dynamic schemas as well as the interpretation of property values, especially constants and (aggregation) operations on those.

Unless your interpretation precisely matches the interpretation of MongoDB, you will have to implement those operators and their interpretation outside MongoDB or in a different way with the capabilities that MongoDB provides.

The above example shows that this is an important design discussion that must take place before system construction in order to avoid bigger problems down the road.

Null, Undefined, NaN and Missing Property: Goto Considered Harmful (Part 1)

How to deal with property values ‘null’, ‘undefined’, ‘NaN’ and missing properties?

Example: Math

The following five documents are the running example for this blog.

{"amount" : 25}
{"amount" : null}
{"amount" : undefined}
{"amount" : NaN}
{"balance" : 33}

All documents have the property ‘amount’, except for the last document (it is missing the property ‘amount’).

  • What is the total sum across all five documents for the values of property ‘amount’?
    • 25?
  • What is the median of the values for property ‘amount’?
    • 25?
  • What is the average of values for property ‘amount’?
    • 25 or 5 or 6.25?

Modeling Considerations

One way to interpret a property value of ‘null’, ‘undefined’, ‘NaN’ is that the property is present, but its value is not set. Another alternative interpretation is that the property is ‘absent’, meaning, it should not be considered at all (as if it were not present).

For the sum across the documents, the interpretation does not make a difference; likewise, we can argue that for the median it does not make a difference, either.

However, for computing the average it makes a huge difference, as in this case the number of documents becomes part of the computation, not just the values. So in the example above, the interpretation as absent means that the number of documents to be considered is 1, so the average is 25. If the interpretation means that there is not value, then there are 4 documents, so the average is 6.25.

But what about the document without the property ‘amount’? Is it interpreted as ‘amount’ is absent or no value? For the latter the average would result to 5.

Local Schema and Dynamic Schema Changes

In the presence of a document local schema (i.e., each document can follow its own schema) and dynamic schema changes it is extremely important to agree on the interpretation of the constants ‘null’, ‘NaN’, and ‘undefined’.

Equally important is to agree on the meaning if a property is absent. This becomes super-important if documents actually change their schema over time, meaning, e.g., that up to a point in time a property was left out if there was no value and after that the property is present with a value of ‘null’.

On top, when the number of documents (and/or number of found properties) is part of a computation, the agreement also needs to include how to count documents or properties across documents in collections.

Laying Down The Rules

So how exactly does one establish an agreement? One possibility is to put down the rules in form of engineering rules that engineers have to enforce and make sure that they are implemented. This is based on convention and agreement to follow them.

Alternatively, helper functions can be implemented that implement the rules directly. A countProperty() function could return for a given set of documents how many contain a specific property. A valueOf() function could return the value of a property. It either returns a value, or throws an NoValueFound exception.

These functions can ensure in their implementation that the agreed upon interpretation is implemented. And, if interpretations for specific circumstances are necessary (e.g., array of scalars), then these functions can become polymorphic or variations can be implemented.