MongoDB and node.js (Part 2): Insertion Log Examination

One important first step is to observe the documents as they get written into MongoDB when storing the test documents from the previous post (https://realprogrammer.wordpress.com/2013/02/10/mongodb-and-node-js-part-1-list-of-documents-for-all-types/). This establishes the base approach of transferring documents defined in JavaScript to documents required as BSON structures.

Type Systems Impedance Mismatch

There is a mismatch in type systems: JavaScript and BSON have some types in common, but not all. Because of this impedance mismatch there must be a mechanism that overcomes it. The following listing discusses those.

Insertion Log Examination

First of all, the JavaScript code of the previous blog drops the target collection and the logging states the outcome:

Drop Collection Result: true

Next, for each document in the previous blog, we show here what the document looks like when written to MongoDB and add a discussion as needed. This can be very boring to read as it is an account of an actual execution; however, please search for specific types if you are only interested in those.

BSON Double

Definition:

{"x": new MongoDB.Double(123.123),
 "comment": "new MongoDB.Double(123.123)",
 "btype": 1}

Written As:

[ { x: { _bsontype: 'Double', value: 123.123 },
    comment: 'new MongoDB.Double(123.123)',
    btype: 1,
    _id: 512675b5508942d427000001 } ]

Comment: There are a few items that require discussion. First, a property “_id” was added by MongoDB. This is the case for all subsequent documents also.

Second, the constructor call “new MongoDB.Double()” was translated into an object that has a property “_bsontype” and a “value”. The “_bsontype” contains the textual specification of the BSON type at hand, and the value contains the value as it was provided to the constructor. This shows that there is no direct translation from a BSON Double to a JavaScript type.

JavaScript Number

Definition:

{"x": 456.456,
 "comment": "456.456",
 "btype": 1}

Written As:

[ { x: 456.456,
    comment: '456.456',
    btype: 1,
    _id: 512675b5508942d427000002 } ]

Comment: This is a direct translation from JavaScript to BSON as no constructor was involved.

JavaScript String

Definition:

{"x": "abc",
 "comment": "abc",
 "btype": 2}

Written As:

[ { x: 'abc',
    comment: 'abc',
    btype: 2,
    _id: 512675b5508942d427000003 } ]

Comment: This is also a direct translation from JavaScript to BSON.

JavaScript Object

Definition:

{"x": {"z": 5},
 "comment": "{\"z\": 5}",
 "btype": 3}

Written As:

[ { x: { z: 5 },
    comment: '{"z": 5}',
    btype: 3,
    _id: 512675b5508942d427000004 } ]

Comment: Objects in JavaScript are directly translated into BSON.

JavaScript Array

Definition:

{"x": [9, 8, 7],
 "comment": "[9, 8, 7]",
 "btype": 16}

Written As:

[ { x: [ 9, 8, 7 ],
    comment: '[9, 8, 7]',
    btype: 16,
    _id: 512675b5508942d427000005 } ]

Comment: Arrays are translated directly into BSON also.

Definition:

{"x": [
        {"y": 4},
        {"z": 5}
      ], 
 "comment": "[{\"y\": 4}, {\"z\": 5}]",
 "btype": 3}

Written As:

[ { x: [ [Object], [Object] ],
    comment: '[{"y": 4}, {"z": 5}]',
    btype: 3,
    _id: 512675b5508942d427000006 } ]

Comment: Again, an array directly translated.

BSON Binary

Definition:

{"x": new MongoDB.Binary("binary"),
 "comment": "new MongoDB.Binary(\"binary\")",
 "btype": 5}

Written As:

[ { x:
    { _bsontype: 'Binary',
      sub_type: 0,
      position: 6,
      buffer: <Buffer 62 69 6e 61 72 79> },
    comment: 'new MongoDB.Binary("binary")',
    btype: 5,
    _id: 512675b5508942d427000007 } ]

Comment: Binary data types are created with a constructor before being passed on to MongoDB.

BSON ObjectId

Definition:

{"x": new MongoDB.ObjectID("5040dc5d40b67c681d000001"),
 "comment": "new MongoDB.ObjectID(\"5040dc5d40b67c681d000001\")",
 "btype": 7}

Written As:

[ { x: 5040dc5d40b67c681d000001,
    comment: 'new MongoDB.ObjectID("5040dc5d40b67c681d000001")',
    btype: 7,
    _id: 512675b5508942d427000008 } ]

Comment: Even though a constructor is given, the object written is not marked with “_bsontype”.

JavaScript Boolean

Definition:

{"x": false,
 "comment": "false",
 "btype": 8}

Written As:

[ { x: false,
    comment: 'false',
    btype: 8,
    _id: 512675b5508942d427000009 } ]

Comment: The Boolean gets translated directly.

Definition:

{"x": true,
 "comment": "true",
 "btype": 8}

Written As:

[ { x: true,
    comment: 'true',
    btype: 8,
    _id: 512675b5508942d42700000a } ]

Comment: The Boolean gets translated directly.

JavaScript Date

Definition:

{"x": new Date("2012-08-31 12:13:14:156 UTC"),
 "comment": "new Date(\"2012-08-31 12:13:14:156 UTC\")",
 "btype": 9}

Written As:

[ { x: Fri Aug 31 2012 05:13:14 GMT-0700 (Pacific Daylight Time),
    comment: 'new Date("2012-08-31 12:13:14:156 UTC")',
    btype: 9,
    _id: 512675b5508942d42700000b } ]

Comment: Written as Date-formatted text.

JavaScript Null

Definition:

{"x": null,
 "comment": "null",
 "btype": 10}

Written As:

[ { x: null,
    comment: 'null',
    btype: 10,
    _id: 512675b5508942d42700000c } ]

Comment: “null” is directly written.

JavaScript Regular Expression

Definition:

{"x": new RegExp("abc"),
 "comment": "new RegExp(\"abc\")",
 "btype": 11}

Written As:

[ { x: /abc/,
    comment: 'new RegExp("abc")',
    btype: 11,
    _id: 512675b5508942d42700000d } ]

Comment: The regular expression is written as regular expression text.

Definition:

{"x": new RegExp("abc", "i"),
 "comment": "new RegExp(\"abc\", \"i\")",
 "btype": 11}

Written As:

[ { x: /abc/i,
    comment: 'new RegExp("abc", "i")',
    btype: 11,
    _id: 512675b5508942d42700000e } ]

Comment: The regular expression is written as regular expression text.

BSON DBRef

There are various ways of defining a BSON DBRef. All possibilities are discussed next.

Definition:

{"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types"),
 "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\", \"types\")",
 "btype": 3}

Written As:

[ { x:
    { _bsontype: 'DBRef',
      namespace: 'types_node',
      oid: '5040dc5d40b67c681d000001',
      db: 'types' },
    comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types")',
    btype: 3,
    _id: 512675b5508942d42700000f } ]

Comment: In this case the constructor creates a complete object, including “_bsontype”.

Definition:

{"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001"),
 "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\")",
 "btype": 3}

Written As:

[ { x:
    { _bsontype: 'DBRef',
      namespace: 'types_node',
      oid: '5040dc5d40b67c681d000001',
      db: undefined },
    comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001")',
    btype: 3,
    _id: 512675b5508942d427000010 } ]

Comment: This is like the case before, except, the database is not defined.

Definition:

{"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"},
 "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\", \"$db\": \"types\"}",
 "btype": 3}

Written As:

[ { x: { '$ref': 'types_node', '$id': '5040dc5d40b67c681d000001' },
    comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"}',
    btype: 3,
    _id: 512675b5508942d427000012 } ]

Comment: This is an alternative way to create a DBRef object in MongoDB. Instead of using the constructor, a document can be directly defined, however, the property names have to match precisely what MongoDB is expecting.

Definition:

{"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"},
 "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\"}",
 "btype": 3}

Written As:

[ { x:
    { '$ref': 'types_node', '$id': '5040dc5d40b67c681d000001', '$db': 'types' },
    comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"}',
    btype: 3,
    _id: 512675b5508942d427000011 } ]

Comment: Like before, this is a direct way to store a DBRef, this time with a database specification.

BSON Code

BSON code comes in two forms, with and without scope. Both are discussed next.

Definition:

{"x": new MongoDB.Code("function () {}"),
 "comment": "new MongoDB.Code(\"function () {}\")",
 "btype": 13}

Written As:

[ { x: { _bsontype: 'Code', code: 'function () {}', scope: {} },
    comment: 'new MongoDB.Code("function () {}")',
    btype: 13,
    _id: 512675b5508942d427000013 } ]

Comment: Code is created as an object with a constructor, hence the “_bsontype” property.

Definition:

{"x": new MongoDB.Code("function (a) {}", {"a": 4}),
 "comment": "new MongoDB.Code(\"function (a) {}\", {\"a\": 4})",
 "btype": 15}

Written As:

[ { x: { _bsontype: 'Code', code: 'function (a) {}', scope: [Object] },
    comment: 'new MongoDB.Code("function (a) {}", {"a": 4})',
    btype: 15,
    _id: 512675b5508942d427000015 } ]

Comment: This case is basically the same as the code without scope.

BSON Symbol

Definition:

{"x": new MongoDB.Symbol("def15"),
 "comment": "new MongoDB.Symbol(\"def15\")",
 "btype": 14}

Written As:

[ { x: def15,
    comment: 'new MongoDB.Symbol("def15")',
    btype: 14,
    _id: 512675b5508942d427000014 } ]

Comment: BSON Symbol is created as object by a constructor. However, it does not have a “_bsontype” as the value of a symbol is converted into a string.

BSON 32-Bit Int

Definition:

{"x": 123456,
 "comment": "123456",
 "btype": 16}

Written As:

[ { x: 123456,
    comment: '123456',
    btype: 16,
    _id: 512675b5508942d427000016 } ]

Comment: A BSON 32-bit int does not require a constructor and maps directly to the corresponding JSON type.

BSON Timestamp

Definition:

{"x": new MongoDB.Timestamp(1, 2),
 "comment": "new MongoDB.Timestamp(1, 2)",
 "btype": 17}

Written As:

[ { x: { _bsontype: 'Timestamp', low_: 1, high_: 2 },
    comment: 'new MongoDB.Timestamp(1, 2)',
    btype: 17,
    _id: 512675b5508942d427000017 } ]

Comment: A BSON Timestamp is created by a constructor as an object. It hence has the “_bsontype” property.

BSON Long

Definition:

{"x": new MongoDB.Long("987"),
 "comment": "new MongoDB.Long(\"987\")",
 "btype": 18}

Written As:

[ { x: { _bsontype: 'Long', low_: 987, high_: 0 },
    comment: 'new MongoDB.Long("987")',
    btype: 18,
    _id: 512675b5508942d427000018 } ]

Comment: The BSON Long type is created as object by a constructor and has the “_bsontype” property.

BSON MinKey and MaxKey

Definition:

{"x": new MongoDB.MinKey(),
 "comment": "MongoDB.MinKey()",
 "btype": 255}
{"x": new MongoDB.MaxKey(),
 "comment": "MongoDB.MaxKey()",
 "btype": 127}

Written As:

[ { x: { _bsontype: 'MinKey' },
    comment: 'MongoDB.MinKey()',
    btype: 255,
    _id: 512675b5508942d427000019 } ]
[ { x: { _bsontype: 'MaxKey' },
    comment: 'MongoDB.MaxKey()',
    btype: 127,
    _id: 512675b5508942d42700001a } ]

Comment: Both, BSON MinKey and MaxKey are created by a constructor as object and have the property “_bsontype”.

JavaScript undefined

Definition:

 {"x": undefined,
  "comment": "undefined",
  "btype": 10}

Written As:

[ { x: undefined,
    comment: 'undefined',
    btype: 10,
    _id: 512675b5508942d42700001b } ]

Comment: The JavaScript ‘undefined’ is directly stored.

JavaScript Number.NaN

Definition:

{"x": Number.NaN,
 "comment": "Number.NaN",
 "btype": 1}

Written As:

[ { x: NaN,
    comment: 'Number.NaN',
    btype: 1,
    _id: 512675b5508942d42700001c } ]

Comment: JavaScript Number.NaN is written as NaN.

JavaScript Infinity, Number.POSITIVE_INFINITY and Number.NEGATIVE_INFINITY

Definition:

{"x": Infinity,
 "comment": "Infinity",
 "btype": 1}
{"x": Number.POSITIVE_INFINITY,
 "comment": "Number.POSITIVE_INFINITY",
 "btype": 1}
{"x": Number.NEGATIVE_INFINITY,
 "comment": "Number.NEGATIVE_INFINITY",
 "btype": 1}

Written As:

[ { x: Infinity,
    comment: 'Infinity',
    btype: 1,
    _id: 512675b5508942d42700001d } ]
[ { x: Infinity,
    comment: 'Number.POSITIVE_INFINITY',
    btype: 1,
    _id: 512675b5508942d42700001e } ]
[ { x: -Infinity,
    comment: 'Number.NEGATIVE_INFINITY',
    btype: 1,
    _id: 512675b5508942d42700001f } ]

Comment: JavaScript Infinity is stored as such. Number.POSITIVE_INFINITY is stored as Infinity and Number.NEGATIVE_INFINITY is stored as -Infinity.

JavaScript Number.MIN_VALUE and Number.MAX_VALUE

Definition:

{"x": Number.MIN_VALUE,
 "comment": "Number.MIN_VALUE",
 "btype": 1}
{"x": Number.MAX_VALUE,
 "comment": "Number.MAX_VALUE",
 "btype": 1}

Written As:

[ { x: 5e-324,
    comment: 'Number.MIN_VALUE',
    btype: 1,
    _id: 512675b5508942d427000020 } ]
[ { x: 1.7976931348623157e+308,
    comment: 'Number.MAX_VALUE',
    btype: 1,
    _id: 512675b5508942d427000021 } ]

Comment: Number.MIN_VALUE and Number.MAX_VALUE are stored as MIN_VALUE and MAX_VALUE respectively.

Summary

This was a very ‘dry’ blog. Nevertheless, when dealing with a database interface, it is important to ensure the proper understanding of how datatypes in the involved type systems map to each other.

It will become extremely relevant when data stored is queried back. A few surprises are waiting for you.

Negation and Complement in a Multi-Schema World

Negation and complement are not so straight forward in a multi-schema world where different documents in a collection follow different schemas. Let’s start with an example.

Example: Count()

> db.nc.save({"a":"value", "b":"v1"})
> db.nc.save({"a":"value", "b":null})
> db.nc.save({"a":"value"})

Now let’s count how many documents are there with ‘b’ having a value of either ‘null’ or not ‘null’.

> db.nc.find({"$or":[{"b":null}, {"b":{$ne:null}}]}).count()
3

Is 3 the correct result?

Basing the query on $type instead results into the same count:

> db.nc.find({"$or":[{"b":{$type:10}}, {"b":{$not:{$type:10}}}]}).count()
3

Is that a correct count?

Interpretation of ‘null’

The query system of MongoDB interprets a query for a property with value ‘null’ in a specific way: if the property is present, then it is checked if the property has the value ‘null’. If the property is not present, then it is interpreted as being present with the value of ‘null’.

In this interpretation, the above query results are correct.

Alternative Interpretation of ‘null’

If only those documents should be considered that actually have the property present, the queries have to be extended to check for that.

> db.nc.find({"$and":[{"b":{$exists:true}}, {"$or":[{"b":null}, {"b":{$ne:null}}]}]}).count()
2

and

> db.nc.find({"$and":[{"b":{$exists:true}}, {"$or":[{"b":{"$type":10}}, {"b":{$not:{"$type": 10}}}]}]}).count()
2

In order to change this interpretation an additional condition is added that checks for the existence of the property being queried. Luckily, the desired interpretation was possible through changing the query without requiring and code.

Looking Beyond ‘null’

Investigating the behavior around ‘null’ was a relatively small effort as it is quite compact. However, with additional operators like ‘$lt’, ‘$gt’, etc., more situations with specific interpretations can be found and have to be carefully planned through when implementing.

There are three basic rules

  • If every document has the same schema (meaning, the same properties with the same names and types), then queries can be written without caveats wrt. multi-schema collections.
  • If documents have different schemas in the sense the some contain a property and some do not contain that property, then queries should be written in such a way that the absence of a property does not cause a wrong result or a wrong interpretation.
  • The same is true for documents that have the same properties, but different types (like the property is sometimes a string and sometimes a number). In this case the queries should be written in such a way that the different types do not lead to a wrong interpretation of the result.

A combination of three cases is possible, meaning, that, for example, all documents have a property ‘p1’ of the same type, a property ‘p2’ that is sometimes not present and a property ‘p3’ that is sometimes not present and sometimes is of different types.

The only alternative to being really careful with the query construction and interpretation is to transform all documents so that they all follow the same schema (global schema for that collection). The particular context and situation determines what the best approach is.

So what’s the Big Deal?

The issues around queries in a multi-schema collection are probably less of a big deal if you are in complete control of your code. In this case you can make sure yourself that the correct interpretation is followed everywhere (by encapsulation, constraint checking, code review or mandate).

However, if you are using libraries as-is and you do not have the ability to change them, you need to make sure that their internal interpretation matches yours. If not, errors can sneak easily into your code base.

On top, sometimes the result determined by code are checked by running queries on the command line or shell. It is not always obvious if e.g. an ‘$exists’ was applied in the code or not. So the ‘checking’ of results using the command-line has its own challenges.

Null, Undefined, NaN and Missing Property: Goto Considered Harmful (Part 3)

It turns out there is a lot more to discuss about MongoDB and its handling of basic and complex data types. That warrants a separate series of blogs altogether. Here is one more part in this three part series; a future new series will focus exclusively on MongoDB data types.

MongoDB and ‘undefined’

As discussed before, it is possible to store a document in MongoDB that has a property with value ‘undefined’. However, it turns out that MongoDB on inserting is changing ‘undefined’ to ‘null’. So a change in value is taking place implicitly. So while MongoDB accepts the value ‘undefined’ on its interface, storage-wise it is changed to ‘null’.

As a consequence, it is not possible to query for ‘undefined’. When querying for ‘null’ all the documents will be returned that were originally stored with the values ‘null’ as well as ‘undefined’ (as ‘undefined’ was changed to ‘null’).

Implication

The implicit change from ‘undefined’ to ‘null’ has a few important implications:

  • If an application makes a semantic distinction between ‘undefined’ and ‘null’, that distinction will not be kept when storing ‘undefined’ and ‘null’ as MongoDB will not keep those as separate values. Instead, the application must record the fact that a property value is ‘undefined’ in a different way (e.g., with an embedded document or a specific value that it interprets as ‘undefined’).
  • MongoDB applies implicit data value changes and that in specific cases the values being put in are not the values being returned when queried. ‘undefined’ is one of values that will be changed. Applications must be aware of this in order to decide if this is OK with them or not. If not, a different representation must be used by the application that does not fall into this category.

Conclusion

This 3-part mini-series started out looking at aggregation in the presence of various data types and brought up a few examples that show different interpretations when aggregating. These interpretations are essential when using MongoDB.

A side effect of the mini-series is that a larger discussion of data types and MongoDB opened up and is necessary to fully understand MongoDB’s behavior. A future blog series will address data types exclusively.

Null, Undefined, NaN and Missing Property: Goto Considered Harmful (Part 2)

Wait a minute: ‘undefined’ and ‘NaN’ is not legal JSON! Part 1 of this blog must have made a mistake?

MongoDB, JSON and Javascript

The MongoDB shell operates on Javascript data types, not JSON data types. Therefore, this is legal and working (MongoDB 2.2.0):

> use blog
switched to db blog
> db.blogColl.save({"amount":25})
> db.blogColl.save({"amount":null})
> db.blogColl.save({"amount":undefined})
> db.blogColl.save({"amount":NaN})
> db.blogColl.save({"balance":33})
>

So, using Javascript constants like ‘null’, ‘undefined’ and ‘NaN’ is possible and legal in context of the MongoDB shell.

MongoDB: Transformation of Input

As discussed elsewhere, MongoDB does transform data types and values at the point of insertion. When retrieving the above, the following result is displayed:

> db.blogColl.find()
{ "_id" : ObjectId("50955b2eeb749e2e4a36d779"), "amount" : 25 }
{ "_id" : ObjectId("50955b38eb749e2e4a36d77a"), "amount" : null }
{ "_id" : ObjectId("50955b3feb749e2e4a36d77b"), "amount" : null }
{ "_id" : ObjectId("50955b47eb749e2e4a36d77c"), "amount" : NaN }
{ "_id" : ObjectId("50955b4eeb749e2e4a36d77d"), "balance" : 33 }

MongoDB decided to honor the Javascript ‘null’ and ‘NaN’, but ‘undefined’ is changed to ‘null’. From a data modeling perspective this is significant as there is no difference in meaning between ‘undefined’ and ‘null’ in the context of MongoDB. An application having a distinct interpretation of those values will cause errors due to the distinction in interpretation it makes, but not MongoDB.

MongoDB: Aggregation Results

Coming back to Part 1 now, the question is, how would the default aggregation operators of MongoDB interpret the above data set in MongoDB? The following is an aggregation framework query:

db.blogColl.aggregate(
    { $group : {
        _id : 1,
        noDocs: { $sum : 1},
        totalamount: { $sum : "$amount" },
        averageamount: { $avg : "$amount" },
        minamount: { $min : "$amount"},
        maxamount: { $max : "$amount"},
        totalbalance: { $sum : "$balance"},
        averagebalance: { $avg : "$balance" },
        minbalance: { $min : "$balance"},
        maxbalance: { $max : "$balance"}
    }}
);

The ‘group’ pipeline step ensures that all documents are seen as one group for the purpose of aggregation. The aggregation operators are applied to all documents in that group. In this example, to all documents.

The documents have either ‘amount’ or ‘balance’ as properties and for both some aggregate values are computed. The outcome when executing the above is:

{
        "result" : [
                {
                        "_id" : 1,
                        "noDocs" : 5,
                        "totalamount" : NaN,
                        "averageamount" : NaN,
                        "maxamount" : 25,
                        "totalbalance" : 33,
                        "averagebalance" : 33,
                        "maxbalance" : 33
                }
        ],
        "ok" : 1
}

There are a few things to call out:

  • The minimum aggregation operator was not successful and the result was left out (the reason is unclear to me).
  • The maximum operator ignored ‘null’ and ‘NaN’ and produced the correct values.
  • The sum and average operators do not ignore ‘null’ and ‘NaN’; so if a non-number is present, the operators indicate that by stating that there was at least one entry that is not a number.

Implications to Modeling Considerations

Part 1 of this blog highlighted that fact that there needs to be a common understanding on dynamic schemas as well as the interpretation of property values, especially constants and (aggregation) operations on those.

Unless your interpretation precisely matches the interpretation of MongoDB, you will have to implement those operators and their interpretation outside MongoDB or in a different way with the capabilities that MongoDB provides.

The above example shows that this is an important design discussion that must take place before system construction in order to avoid bigger problems down the road.