Schema-free Database (Part 1): An Oxymoron

The notion of a ‘schema-free database’ keeps coming up, most recently in a meetup I attended a few days ago. Some rationalization follows divided up into the categories of ‘document’ and ‘database’.

While a generalization is easily possible, the context here will be JSON and MongoDB as these are two practical implementations that are available and often used as examples of a ‘schema-free database’. Those provide a nice constraint technology set as an example, while the principles apply to a whole range of other technology, of course.

Document

A JSON document, in short: document, follows a set of construction principles outlined here: http://www.json.org/. This is a rather informal grammar that defines how a valid JSON document is constructed. There are no data type generators and so new data types cannot be introduced; therefore, every document is constructed from the fixed set of types enumerated on that web page.

Document Schema

An attempt has been made to create a more formal mechanism to define a schema for JSON documents: http://json-schema.org/. This approach provides a formal language to describe the schema of a JSON document explicitly.

The json-schema approach combined with the fixed set of types available to create a JSON document means that every JSON document can be described explicitly using json-schema without exceptions. This in turns means that every JSON document has at least an implicit schema, unless it is additionally made explicit with e.g. json-schema.

Therefore, JSON documents have a schema, an implicit one and optionally an explicit one. Depending on the particular schema definition language approach itself, a document might match more than one schema, but that is left for a separate discussion.

Set of Document Schemas

Given a set of JSON documents it is now possible to characterize their relationship to schemas. In the ‘best’ case, all documents follow the same schema; in the ‘worst’ case, each document follows its own schema. And there are cases in-between where a subset of the documents validates against a schema, and another subset against another schema. Depending on the design, one JSON document might validate against different schemas.

The relationship between documents and schemas is n:m in general.

Database

In context of a database, there are a few interesting questions in this context:

  • Does the database understand a document representation (e.g. JSON)?
  • Does the database enforce a document representation?

And:

  • Does the database enforce a schema?
  • Does the database understand a schema?

Let’s answer these questions for MongoDB specifically:

And:

  • MongoDB enforces a partial schema. Each document must have a property called “_id”; if the document being inserted does not have such a property, one is automatically added.
  • MongoDB does not understand an explicit schema as it does not provide for a mechanism to load a schema definition language.

MongoDB, however, understands implicit schemas as MongoDB does allow to e.g. create an index on any property of documents. So MongoDB recognizes properties.

Furthermore, MongoDB supports aggregation functions and supports e.g. the sum of properties across documents (https://realprogrammer.wordpress.com/2012/11/04/null-undefined-nan-and-missing-property-goto-considered-harmful-part-2/). So it is data type aware and implements operators (e.g. sum) on those.

Conclusion: The Notion of ‘Schema’ is Changing

This rather brief discussion clearly rationalizes that the label ‘schema-free database’ is not applicable to technologies such as those discussed in this blog (JSON/BSON, MongoDB).

Hence these technologies are not an example of ‘schema-free database’, to the contrary: they demonstrate that the notion of ‘schema’ can have a wider and more flexible interpretation then what relational databases bring forward.

Advertisements

MongoDB and node.js (Part 4): Query Results

Now that the data is stored in MongoDB (https://realprogrammer.wordpress.com/2013/03/27/mongodb-and-node-js-part-2-insertion-log-examination/), let’s get it out again through queries and examine the queries’ results.

Part 3 of this series (https://realprogrammer.wordpress.com/2013/05/29/mongodb-and-node-js-part-3-looking-at-query-results-its-complicated/) introduced the reason why the results are printed with and without using JSON.stringify() and outlined why in context of MongoDB and node.js using JSON.stringify() might be not such a good idea.

There are as many queries generated as there are BSON types and the the BSON types are queried in ascending order. The printout after “===>” is generated by “console.log(<query_result_document>)” and the printout after “—>” is generated by “console.log(JSON.stringify(<query_result_document>))”.

Some interpretation will follow in Part 5 of this series. Here are the results uninterpreted for now:

===> { x: 123.123,
  comment: 'new MongoDB.Double(123.123)',
  btype: 1,
  _id: 5269796fab3df2f90b000001 }
---> {"x":123.123,"comment":"new MongoDB.Double(123.123)","btype":1,"_id":"5269796fab3df2f90b000001"}

===> { x: 456.456,
  comment: '456.456',
  btype: 1,
  _id: 5269796fab3df2f90b000002 }
---> {"x":456.456,"comment":"456.456","btype":1,"_id":"5269796fab3df2f90b000002"}

===> { x: NaN,
  comment: 'Number.NaN',
  btype: 1,
  _id: 5269796fab3df2f90b00001c }
---> {"x":null,"comment":"Number.NaN","btype":1,"_id":"5269796fab3df2f90b00001c"}

===> { x: Infinity,
  comment: 'Infinity',
  btype: 1,
  _id: 5269796fab3df2f90b00001d }
---> {"x":null,"comment":"Infinity","btype":1,"_id":"5269796fab3df2f90b00001d"}

===> { x: Infinity,
  comment: 'Number.POSITIVE_INFINITY',
  btype: 1,
  _id: 5269796fab3df2f90b00001e }
---> {"x":null,"comment":"Number.POSITIVE_INFINITY","btype":1,"_id":"5269796fab3df2f90b00001e"}

===> { x: -Infinity,
  comment: 'Number.NEGATIVE_INFINITY',
  btype: 1,
  _id: 5269796fab3df2f90b00001f }
---> {"x":null,"comment":"Number.NEGATIVE_INFINITY","btype":1,"_id":"5269796fab3df2f90b00001f"}

===> { x: 5e-324,
  comment: 'MIN_VALUE',
  btype: 1,
  _id: 5269796fab3df2f90b000020 }
---> {"x":5e-324,"comment":"MIN_VALUE","btype":1,"_id":"5269796fab3df2f90b000020"}

===> { x: 1.7976931348623157e+308,
  comment: 'MAX_VALUE',
  btype: 1,
  _id: 5269796fab3df2f90b000021 }
---> {"x":1.7976931348623157e+308,"comment":"MAX_VALUE","btype":1,"_id":"5269796fab3df2f90b000021"}

===> { x: 'abc',
  comment: 'abc',
  btype: 2,
  _id: 5269796fab3df2f90b000003 }
---> {"x":"abc","comment":"abc","btype":2,"_id":"5269796fab3df2f90b000003"}

===> { x: { z: 5 },
  comment: '{"z": 5}',
  btype: 3,
  _id: 5269796fab3df2f90b000004 }
---> {"x":{"z":5},"comment":"{\"z\": 5}","btype":3,"_id":"5269796fab3df2f90b000004"}

===> { x: [ { y: 4 }, { z: 5 } ],
  comment: '[{"y": 4}, {"z": 5}]',
  btype: 3,
  _id: 5269796fab3df2f90b000006 }
---> {"x":[{"y":4},{"z":5}],"comment":"[{\"y\": 4}, {\"z\": 5}]","btype":3,"_id":"5269796fab3df2f90b000006"}

===> { x: 
   { _bsontype: 'DBRef',
     namespace: 'types_node',
     oid: '5040dc5d40b67c681d000001',
     db: 'types' },
  comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types")',
  btype: 3,
  _id: 5269796fab3df2f90b00000f }
---> {"x":{"$ref":"types_node","$id":"5040dc5d40b67c681d000001","$db":"types"},"comment":"new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\", \"types\")","btype":3,"_id":"5269796fab3df2f90b00000f"}

===> { x: 
   { _bsontype: 'DBRef',
     namespace: 'types_node',
     oid: '5040dc5d40b67c681d000001',
     db: undefined },
  comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001")',
  btype: 3,
  _id: 5269796fab3df2f90b000010 }
---> {"x":{"$ref":"types_node","$id":"5040dc5d40b67c681d000001","$db":""},"comment":"new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\")","btype":3,"_id":"5269796fab3df2f90b000010"}

===> { x: 
   { _bsontype: 'DBRef',
     namespace: 'types_node',
     oid: '5040dc5d40b67c681d000001',
     db: 'types' },
  comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"}',
  btype: 3,
  _id: 5269796fab3df2f90b000011 }
---> {"x":{"$ref":"types_node","$id":"5040dc5d40b67c681d000001","$db":"types"},"comment":"{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\", \"$db\": \"types\"}","btype":3,"_id":"5269796fab3df2f90b000011"}

===> { x: 
   { _bsontype: 'DBRef',
     namespace: 'types_node',
     oid: '5040dc5d40b67c681d000001',
     db: undefined },
  comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"}',
  btype: 3,
  _id: 5269796fab3df2f90b000012 }
---> {"x":{"$ref":"types_node","$id":"5040dc5d40b67c681d000001","$db":""},"comment":"{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\"}","btype":3,"_id":"5269796fab3df2f90b000012"}

===> { x: 
   { _bsontype: 'Binary',
     sub_type: 0,
     position: 6,
     buffer: <Buffer 62 69 6e 61 72 79> },
  comment: 'new MongoDB.Binary("binary")',
  btype: 5,
  _id: 5269796fab3df2f90b000007 }
---> {"x":"YmluYXJ5","comment":"new MongoDB.Binary(\"binary\")","btype":5,"_id":"5269796fab3df2f90b000007"}

===> { x: 5040dc5d40b67c681d000001,
  comment: 'new MongoDB.ObjectID("5040dc5d40b67c681d000001")',
  btype: 7,
  _id: 5269796fab3df2f90b000008 }
---> {"x":"5040dc5d40b67c681d000001","comment":"new MongoDB.ObjectID(\"5040dc5d40b67c681d000001\")","btype":7,"_id":"5269796fab3df2f90b000008"}

===> { x: false,
  comment: 'false',
  btype: 8,
  _id: 5269796fab3df2f90b000009 }
---> {"x":false,"comment":"false","btype":8,"_id":"5269796fab3df2f90b000009"}

===> { x: true,
  comment: 'true',
  btype: 8,
  _id: 5269796fab3df2f90b00000a }
---> {"x":true,"comment":"true","btype":8,"_id":"5269796fab3df2f90b00000a"}

===> { x: Fri Aug 31 2012 05:13:14 GMT-0700 (PDT),
  comment: 'new Date("2012-08-31 12:13:14:156 UTC")',
  btype: 9,
  _id: 5269796fab3df2f90b00000b }
---> {"x":"2012-08-31T12:13:14.156Z","comment":"new Date(\"2012-08-31 12:13:14:156 UTC\")","btype":9,"_id":"5269796fab3df2f90b00000b"}

===> { x: null,
  comment: 'null',
  btype: 10,
  _id: 5269796fab3df2f90b00000c }
---> {"x":null,"comment":"null","btype":10,"_id":"5269796fab3df2f90b00000c"}

===> { x: null,
  comment: 'undefined',
  btype: 10,
  _id: 5269796fab3df2f90b00001b }
---> {"x":null,"comment":"undefined","btype":10,"_id":"5269796fab3df2f90b00001b"}

===> { x: /abc/,
  comment: 'new RegExp("abc")',
  btype: 11,
  _id: 5269796fab3df2f90b00000d }
---> {"x":{},"comment":"new RegExp(\"abc\")","btype":11,"_id":"5269796fab3df2f90b00000d"}

===> { x: /abc/i,
  comment: 'new RegExp("abc", "i")',
  btype: 11,
  _id: 5269796fab3df2f90b00000e }
---> {"x":{},"comment":"new RegExp(\"abc\", \"i\")","btype":11,"_id":"5269796fab3df2f90b00000e"}

===> { x: { _bsontype: 'Code', code: 'function () {}', scope: {} },
  comment: 'new MongoDB.Code("function () {}")',
  btype: 13,
  _id: 5269796fab3df2f90b000013 }
---> {"x":{"scope":{},"code":"function () {}"},"comment":"new MongoDB.Code(\"function () {}\")","btype":13,"_id":"5269796fab3df2f90b000013"}

===> { x: def15,
  comment: 'new MongoDB.Symbol("def15")',
  btype: 14,
  _id: 5269796fab3df2f90b000014 }
---> {"x":"def15","comment":"new MongoDB.Symbol(\"def15\")","btype":14,"_id":"5269796fab3df2f90b000014"}

===> { x: { _bsontype: 'Code', code: 'function (a) {}', scope: { a: 4 } },
  comment: ' new MongoDB.Code("function (a) {}", {"a": 4})',
  btype: 15,
  _id: 5269796fab3df2f90b000015 }
---> {"x":{"scope":{"a":4},"code":"function (a) {}"},"comment":" new MongoDB.Code(\"function (a) {}\", {\"a\": 4})","btype":15,"_id":"5269796fab3df2f90b000015"}

===> { x: [ 9, 8, 7 ],
  comment: '[9, 8, 7]',
  btype: 16,
  _id: 5269796fab3df2f90b000005 }
---> {"x":[9,8,7],"comment":"[9, 8, 7]","btype":16,"_id":"5269796fab3df2f90b000005"}

===> { x: 123456,
  comment: '123456',
  btype: 16,
  _id: 5269796fab3df2f90b000016 }
---> {"x":123456,"comment":"123456","btype":16,"_id":"5269796fab3df2f90b000016"}

===> { x: { _bsontype: 'Timestamp', low_: 1, high_: 2 },
  comment: 'new MongoDB.Timestamp(1, 2)',
  btype: 17,
  _id: 5269796fab3df2f90b000017 }
---> {"x":"8589934593","comment":"new MongoDB.Timestamp(1, 2)","btype":17,"_id":"5269796fab3df2f90b000017"}

===> { x: 987,
  comment: 'new MongoDB.Long("987")',
  btype: 18,
  _id: 5269796fab3df2f90b000018 }
---> {"x":987,"comment":"new MongoDB.Long(\"987\")","btype":18,"_id":"5269796fab3df2f90b000018"}

===> { x: { _bsontype: 'MaxKey' },
  comment: 'new MongoDB.MaxKey()',
  btype: 127,
  _id: 5269796fab3df2f90b00001a }
---> {"x":{"_bsontype":"MaxKey"},"comment":"new MongoDB.MaxKey()","btype":127,"_id":"5269796fab3df2f90b00001a"}

===> { x: { _bsontype: 'MaxKey' },
  comment: 'new MongoDB.MaxKey()',
  btype: 127,
  _id: 5269796fab3df2f90b00001a }
---> {"x":{"_bsontype":"MaxKey"},"comment":"new MongoDB.MaxKey()","btype":127,"_id":"5269796fab3df2f90b00001a"}

===> { x: { _bsontype: 'MinKey' },
  comment: 'new MongoDB.MinKey()',
  btype: 255,
  _id: 5269796fab3df2f90b000019 }
---> {"x":{"_bsontype":"MinKey"},"comment":"new MongoDB.MinKey()","btype":255,"_id":"5269796fab3df2f90b000019"}

===> { x: { _bsontype: 'MinKey' },
  comment: 'new MongoDB.MinKey()',
  btype: 255,
  _id: 5269796fab3df2f90b000019 }
---> {"x":{"_bsontype":"MinKey"},"comment":"new MongoDB.MinKey()","btype":255,"_id":"5269796fab3df2f90b000019"}

MongoDB and node.js (Part 2): Insertion Log Examination

One important first step is to observe the documents as they get written into MongoDB when storing the test documents from the previous post (https://realprogrammer.wordpress.com/2013/02/10/mongodb-and-node-js-part-1-list-of-documents-for-all-types/). This establishes the base approach of transferring documents defined in JavaScript to documents required as BSON structures.

Type Systems Impedance Mismatch

There is a mismatch in type systems: JavaScript and BSON have some types in common, but not all. Because of this impedance mismatch there must be a mechanism that overcomes it. The following listing discusses those.

Insertion Log Examination

First of all, the JavaScript code of the previous blog drops the target collection and the logging states the outcome:

Drop Collection Result: true

Next, for each document in the previous blog, we show here what the document looks like when written to MongoDB and add a discussion as needed. This can be very boring to read as it is an account of an actual execution; however, please search for specific types if you are only interested in those.

BSON Double

Definition:

{"x": new MongoDB.Double(123.123),
 "comment": "new MongoDB.Double(123.123)",
 "btype": 1}

Written As:

[ { x: { _bsontype: 'Double', value: 123.123 },
    comment: 'new MongoDB.Double(123.123)',
    btype: 1,
    _id: 512675b5508942d427000001 } ]

Comment: There are a few items that require discussion. First, a property “_id” was added by MongoDB. This is the case for all subsequent documents also.

Second, the constructor call “new MongoDB.Double()” was translated into an object that has a property “_bsontype” and a “value”. The “_bsontype” contains the textual specification of the BSON type at hand, and the value contains the value as it was provided to the constructor. This shows that there is no direct translation from a BSON Double to a JavaScript type.

JavaScript Number

Definition:

{"x": 456.456,
 "comment": "456.456",
 "btype": 1}

Written As:

[ { x: 456.456,
    comment: '456.456',
    btype: 1,
    _id: 512675b5508942d427000002 } ]

Comment: This is a direct translation from JavaScript to BSON as no constructor was involved.

JavaScript String

Definition:

{"x": "abc",
 "comment": "abc",
 "btype": 2}

Written As:

[ { x: 'abc',
    comment: 'abc',
    btype: 2,
    _id: 512675b5508942d427000003 } ]

Comment: This is also a direct translation from JavaScript to BSON.

JavaScript Object

Definition:

{"x": {"z": 5},
 "comment": "{\"z\": 5}",
 "btype": 3}

Written As:

[ { x: { z: 5 },
    comment: '{"z": 5}',
    btype: 3,
    _id: 512675b5508942d427000004 } ]

Comment: Objects in JavaScript are directly translated into BSON.

JavaScript Array

Definition:

{"x": [9, 8, 7],
 "comment": "[9, 8, 7]",
 "btype": 16}

Written As:

[ { x: [ 9, 8, 7 ],
    comment: '[9, 8, 7]',
    btype: 16,
    _id: 512675b5508942d427000005 } ]

Comment: Arrays are translated directly into BSON also.

Definition:

{"x": [
        {"y": 4},
        {"z": 5}
      ], 
 "comment": "[{\"y\": 4}, {\"z\": 5}]",
 "btype": 3}

Written As:

[ { x: [ [Object], [Object] ],
    comment: '[{"y": 4}, {"z": 5}]',
    btype: 3,
    _id: 512675b5508942d427000006 } ]

Comment: Again, an array directly translated.

BSON Binary

Definition:

{"x": new MongoDB.Binary("binary"),
 "comment": "new MongoDB.Binary(\"binary\")",
 "btype": 5}

Written As:

[ { x:
    { _bsontype: 'Binary',
      sub_type: 0,
      position: 6,
      buffer: <Buffer 62 69 6e 61 72 79> },
    comment: 'new MongoDB.Binary("binary")',
    btype: 5,
    _id: 512675b5508942d427000007 } ]

Comment: Binary data types are created with a constructor before being passed on to MongoDB.

BSON ObjectId

Definition:

{"x": new MongoDB.ObjectID("5040dc5d40b67c681d000001"),
 "comment": "new MongoDB.ObjectID(\"5040dc5d40b67c681d000001\")",
 "btype": 7}

Written As:

[ { x: 5040dc5d40b67c681d000001,
    comment: 'new MongoDB.ObjectID("5040dc5d40b67c681d000001")',
    btype: 7,
    _id: 512675b5508942d427000008 } ]

Comment: Even though a constructor is given, the object written is not marked with “_bsontype”.

JavaScript Boolean

Definition:

{"x": false,
 "comment": "false",
 "btype": 8}

Written As:

[ { x: false,
    comment: 'false',
    btype: 8,
    _id: 512675b5508942d427000009 } ]

Comment: The Boolean gets translated directly.

Definition:

{"x": true,
 "comment": "true",
 "btype": 8}

Written As:

[ { x: true,
    comment: 'true',
    btype: 8,
    _id: 512675b5508942d42700000a } ]

Comment: The Boolean gets translated directly.

JavaScript Date

Definition:

{"x": new Date("2012-08-31 12:13:14:156 UTC"),
 "comment": "new Date(\"2012-08-31 12:13:14:156 UTC\")",
 "btype": 9}

Written As:

[ { x: Fri Aug 31 2012 05:13:14 GMT-0700 (Pacific Daylight Time),
    comment: 'new Date("2012-08-31 12:13:14:156 UTC")',
    btype: 9,
    _id: 512675b5508942d42700000b } ]

Comment: Written as Date-formatted text.

JavaScript Null

Definition:

{"x": null,
 "comment": "null",
 "btype": 10}

Written As:

[ { x: null,
    comment: 'null',
    btype: 10,
    _id: 512675b5508942d42700000c } ]

Comment: “null” is directly written.

JavaScript Regular Expression

Definition:

{"x": new RegExp("abc"),
 "comment": "new RegExp(\"abc\")",
 "btype": 11}

Written As:

[ { x: /abc/,
    comment: 'new RegExp("abc")',
    btype: 11,
    _id: 512675b5508942d42700000d } ]

Comment: The regular expression is written as regular expression text.

Definition:

{"x": new RegExp("abc", "i"),
 "comment": "new RegExp(\"abc\", \"i\")",
 "btype": 11}

Written As:

[ { x: /abc/i,
    comment: 'new RegExp("abc", "i")',
    btype: 11,
    _id: 512675b5508942d42700000e } ]

Comment: The regular expression is written as regular expression text.

BSON DBRef

There are various ways of defining a BSON DBRef. All possibilities are discussed next.

Definition:

{"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types"),
 "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\", \"types\")",
 "btype": 3}

Written As:

[ { x:
    { _bsontype: 'DBRef',
      namespace: 'types_node',
      oid: '5040dc5d40b67c681d000001',
      db: 'types' },
    comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types")',
    btype: 3,
    _id: 512675b5508942d42700000f } ]

Comment: In this case the constructor creates a complete object, including “_bsontype”.

Definition:

{"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001"),
 "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\")",
 "btype": 3}

Written As:

[ { x:
    { _bsontype: 'DBRef',
      namespace: 'types_node',
      oid: '5040dc5d40b67c681d000001',
      db: undefined },
    comment: 'new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001")',
    btype: 3,
    _id: 512675b5508942d427000010 } ]

Comment: This is like the case before, except, the database is not defined.

Definition:

{"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"},
 "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\", \"$db\": \"types\"}",
 "btype": 3}

Written As:

[ { x: { '$ref': 'types_node', '$id': '5040dc5d40b67c681d000001' },
    comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"}',
    btype: 3,
    _id: 512675b5508942d427000012 } ]

Comment: This is an alternative way to create a DBRef object in MongoDB. Instead of using the constructor, a document can be directly defined, however, the property names have to match precisely what MongoDB is expecting.

Definition:

{"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"},
 "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\"}",
 "btype": 3}

Written As:

[ { x:
    { '$ref': 'types_node', '$id': '5040dc5d40b67c681d000001', '$db': 'types' },
    comment: '{"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"}',
    btype: 3,
    _id: 512675b5508942d427000011 } ]

Comment: Like before, this is a direct way to store a DBRef, this time with a database specification.

BSON Code

BSON code comes in two forms, with and without scope. Both are discussed next.

Definition:

{"x": new MongoDB.Code("function () {}"),
 "comment": "new MongoDB.Code(\"function () {}\")",
 "btype": 13}

Written As:

[ { x: { _bsontype: 'Code', code: 'function () {}', scope: {} },
    comment: 'new MongoDB.Code("function () {}")',
    btype: 13,
    _id: 512675b5508942d427000013 } ]

Comment: Code is created as an object with a constructor, hence the “_bsontype” property.

Definition:

{"x": new MongoDB.Code("function (a) {}", {"a": 4}),
 "comment": "new MongoDB.Code(\"function (a) {}\", {\"a\": 4})",
 "btype": 15}

Written As:

[ { x: { _bsontype: 'Code', code: 'function (a) {}', scope: [Object] },
    comment: 'new MongoDB.Code("function (a) {}", {"a": 4})',
    btype: 15,
    _id: 512675b5508942d427000015 } ]

Comment: This case is basically the same as the code without scope.

BSON Symbol

Definition:

{"x": new MongoDB.Symbol("def15"),
 "comment": "new MongoDB.Symbol(\"def15\")",
 "btype": 14}

Written As:

[ { x: def15,
    comment: 'new MongoDB.Symbol("def15")',
    btype: 14,
    _id: 512675b5508942d427000014 } ]

Comment: BSON Symbol is created as object by a constructor. However, it does not have a “_bsontype” as the value of a symbol is converted into a string.

BSON 32-Bit Int

Definition:

{"x": 123456,
 "comment": "123456",
 "btype": 16}

Written As:

[ { x: 123456,
    comment: '123456',
    btype: 16,
    _id: 512675b5508942d427000016 } ]

Comment: A BSON 32-bit int does not require a constructor and maps directly to the corresponding JSON type.

BSON Timestamp

Definition:

{"x": new MongoDB.Timestamp(1, 2),
 "comment": "new MongoDB.Timestamp(1, 2)",
 "btype": 17}

Written As:

[ { x: { _bsontype: 'Timestamp', low_: 1, high_: 2 },
    comment: 'new MongoDB.Timestamp(1, 2)',
    btype: 17,
    _id: 512675b5508942d427000017 } ]

Comment: A BSON Timestamp is created by a constructor as an object. It hence has the “_bsontype” property.

BSON Long

Definition:

{"x": new MongoDB.Long("987"),
 "comment": "new MongoDB.Long(\"987\")",
 "btype": 18}

Written As:

[ { x: { _bsontype: 'Long', low_: 987, high_: 0 },
    comment: 'new MongoDB.Long("987")',
    btype: 18,
    _id: 512675b5508942d427000018 } ]

Comment: The BSON Long type is created as object by a constructor and has the “_bsontype” property.

BSON MinKey and MaxKey

Definition:

{"x": new MongoDB.MinKey(),
 "comment": "MongoDB.MinKey()",
 "btype": 255}
{"x": new MongoDB.MaxKey(),
 "comment": "MongoDB.MaxKey()",
 "btype": 127}

Written As:

[ { x: { _bsontype: 'MinKey' },
    comment: 'MongoDB.MinKey()',
    btype: 255,
    _id: 512675b5508942d427000019 } ]
[ { x: { _bsontype: 'MaxKey' },
    comment: 'MongoDB.MaxKey()',
    btype: 127,
    _id: 512675b5508942d42700001a } ]

Comment: Both, BSON MinKey and MaxKey are created by a constructor as object and have the property “_bsontype”.

JavaScript undefined

Definition:

 {"x": undefined,
  "comment": "undefined",
  "btype": 10}

Written As:

[ { x: undefined,
    comment: 'undefined',
    btype: 10,
    _id: 512675b5508942d42700001b } ]

Comment: The JavaScript ‘undefined’ is directly stored.

JavaScript Number.NaN

Definition:

{"x": Number.NaN,
 "comment": "Number.NaN",
 "btype": 1}

Written As:

[ { x: NaN,
    comment: 'Number.NaN',
    btype: 1,
    _id: 512675b5508942d42700001c } ]

Comment: JavaScript Number.NaN is written as NaN.

JavaScript Infinity, Number.POSITIVE_INFINITY and Number.NEGATIVE_INFINITY

Definition:

{"x": Infinity,
 "comment": "Infinity",
 "btype": 1}
{"x": Number.POSITIVE_INFINITY,
 "comment": "Number.POSITIVE_INFINITY",
 "btype": 1}
{"x": Number.NEGATIVE_INFINITY,
 "comment": "Number.NEGATIVE_INFINITY",
 "btype": 1}

Written As:

[ { x: Infinity,
    comment: 'Infinity',
    btype: 1,
    _id: 512675b5508942d42700001d } ]
[ { x: Infinity,
    comment: 'Number.POSITIVE_INFINITY',
    btype: 1,
    _id: 512675b5508942d42700001e } ]
[ { x: -Infinity,
    comment: 'Number.NEGATIVE_INFINITY',
    btype: 1,
    _id: 512675b5508942d42700001f } ]

Comment: JavaScript Infinity is stored as such. Number.POSITIVE_INFINITY is stored as Infinity and Number.NEGATIVE_INFINITY is stored as -Infinity.

JavaScript Number.MIN_VALUE and Number.MAX_VALUE

Definition:

{"x": Number.MIN_VALUE,
 "comment": "Number.MIN_VALUE",
 "btype": 1}
{"x": Number.MAX_VALUE,
 "comment": "Number.MAX_VALUE",
 "btype": 1}

Written As:

[ { x: 5e-324,
    comment: 'Number.MIN_VALUE',
    btype: 1,
    _id: 512675b5508942d427000020 } ]
[ { x: 1.7976931348623157e+308,
    comment: 'Number.MAX_VALUE',
    btype: 1,
    _id: 512675b5508942d427000021 } ]

Comment: Number.MIN_VALUE and Number.MAX_VALUE are stored as MIN_VALUE and MAX_VALUE respectively.

Summary

This was a very ‘dry’ blog. Nevertheless, when dealing with a database interface, it is important to ensure the proper understanding of how datatypes in the involved type systems map to each other.

It will become extremely relevant when data stored is queried back. A few surprises are waiting for you.

MongoDB and node.js (Part 1): List of Documents for all Types

This blog contains the document set that will be used throughout the blog series.

Document Set Characteristics

The data set is a collection of documents, whereby each document contains three properties:

  • “x”: This property contains a different data value in each document
  • “comment”: this property describes “x” as originally specified. When the document is queried, it is clear what the original specification of “x” was. This is necessary as the node.js driver modifies documents on insert.
  • “btype”: this property contains the BSON type designator. When the document is queried, the type is clear without having to query it explicitly.

Since the blog series is evolving, the set of documents might change as needed in order to point out more specifics; the current set is a good starting point, however.

If there is more than one way to insert a specific type, several documents are included, one document for each possibility. This is to show the possible alternatives exhaustively.

JavaScript, BSON, JSON

Node.js is is implementing server-side JavaScript (based on Google’s V8 Engine). So the data types that are passed back and forth to the MongoDB Node.js driver are JavaScript types.

MongoDB internally operates on BSON. This means that on the way from the Node.js driver to MongoDB JavaScript types are transformed into BSON (and vice versa on the way back). For some BSON types MongoDB provides constructors.

While many equate JavaScript data types structures with JSON, actually JSON is not an equivalent serialization of JavaScript types (more on that issue during the blog series).

Based on this discussion, the test data document set tries to cover all JavaScript types and their variation, plus the JavaScript implementation of BSON types. This will also clarify more over the course of the blog series.

Document Set

The following code is complete in the sense that it runs and inserts the documents into a MongoDB database called “nodeTest”, using all the defaults.

/*global require*/

var MongoDB = require('mongodb');

/*
 Type codes
 ==========
 1 "\x01" e_name double             Floating point
 2 "\x02" e_name string             UTF-8 string
 3 "\x03" e_name document           Embedded document
 4 "\x04" e_name document           Array
 5 "\x05" e_name binary             Binary data

 7 "\x07" e_name (byte*12)          ObjectId
 8 "\x08" e_name "\x00"             Boolean "false"
 8 "\x08" e_name "\x01"             Boolean "true"
 9 "\x09" e_name int64              UTC datetime
 10 "\x0A" e_name Null value
 11 "\x0B" e_name cstring cstring   Regular expression

 13 "\x0D" e_name string            JavaScript code

 15 "\x0F" e_name code_w_s          JavaScript code w/ scope
 16 "\x10" e_name int32             32-bit Integer
 17 "\x11" e_name int64             Timestamp
 18 "\x12" e_name int64             64-bit integer
 255 "\xFF" e_name Min key
 127 "\x7F" e_name Max key

 Deprecated type codes
 =====================
 6 "\x06" e_name                    Undefined — Deprecated
 12 "\x0C" e_name string (byte*12)  DBPointer — Deprecated
 14 "\x0E" e_name string            Symbol — Deprecated

 */

var typeDocuments;

typeDocuments = [
    {"x": new MongoDB.Double(123.123),
        "comment": "new MongoDB.Double(123.123)",
        "btype": 1},
    {"x": 456.456,
        "comment": "456.456",
        "btype": 1},
    {"x": "abc",
        "comment": "abc",
        "btype": 2},
    {"x": {"z": 5},
        "comment": "{\"z\": 5}",
        "btype": 3},
    // this is not type:4
    {"x": [9, 8, 7],
        "comment": "[9, 8, 7]",
        "btype": 16},
    {"x": [
        {"y": 4},
        {"z": 5}
    ], "comment": "[{\"y\": 4}, {\"z\": 5}]",
        "btype": 3},
    {"x": new MongoDB.Binary("binary"),
        "comment": "new MongoDB.Binary(\"binary\")",
        "btype": 5},
    // t:6 deprecated (was 'undefined') - not implemented
    {"x": new MongoDB.ObjectID("5040dc5d40b67c681d000001"),
        "comment": "new MongoDB.ObjectID(\"5040dc5d40b67c681d000001\")",
        "btype": 7},
    {"x": false,
        "comment": "false",
        "btype": 8},
    {"x": true,
        "comment": "true",
        "btype": 8},
    {"x": new Date("2012-08-31 12:13:14:156 UTC"),
        "comment": "new Date(\"2012-08-31 12:13:14:156 UTC\")",
        "btype": 9},
    {"x": null,
        "comment": "null",
        "btype": 10},
    {"x": new RegExp("abc"),
        "comment": "new RegExp(\"abc\")",
        "btype": 11},
    {"x": new RegExp("abc", "i"),
        "comment": "new RegExp(\"abc\", \"i\")",
        "btype": 11},
    // t:12 DBRef deprecated - still implemented
    // this is not type:12
    {"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001", "types"),
        "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\", \"types\")",
        "btype": 3},
    // this is not type:12
    {"x": new MongoDB.DBRef("types_node", "5040dc5d40b67c681d000001"),
        "comment": "new MongoDB.DBRef(\"types_node\", \"5040dc5d40b67c681d000001\")",
        "btype": 3},
    // MongoDB defined JSON serialization (http://docs.mongodb.org/manual/reference/mongodb-extended-json/)
    // this is not type:12
    {"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001", "$db": "types"},
        "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\", \"$db\": \"types\"}",
        "btype": 3},
    // this is not type:12
    {"x": {"$ref": "types_node", "$id": "5040dc5d40b67c681d000001"},
        "comment": "{\"$ref\": \"types_node\", \"$id\": \"5040dc5d40b67c681d000001\"}",
        "btype": 3},
    {"x": new MongoDB.Code("function () {}"),
        "comment": "new MongoDB.Code(\"function () {}\")",
        "btype": 13},
    // t:14 Symbol deprecated - still implemented
    {"x": new MongoDB.Symbol("def15"),
        "comment": "new MongoDB.Symbol(\"def15\")",
        "btype": 14},
    {"x": new MongoDB.Code("function (a) {}", {"a": 4}),
        "comment": " new MongoDB.Code(\"function (a) {}\", {\"a\": 4})",
        "btype": 15},
    {"x": 123456,
        "comment": "123456",
        "btype": 16},
    {"x": new MongoDB.Timestamp(1, 2),
        "comment": "new MongoDB.Timestamp(1, 2)",
        "btype": 17},
    {"x": new MongoDB.Long("987"),
        "comment": "new MongoDB.Long(\"987\")",
        "btype": 18},
    {"x": new MongoDB.MinKey(),
        "comment": "new MongoDB.MinKey()",
        "btype": 255},
    {"x": new MongoDB.MaxKey(),
        "comment": "new MongoDB.MaxKey()",
        "btype": 127},
    // ADDITIONAL POSSIBLE VALUES
    // 'undefined' will be converted to 'null'; type will be 'null' (aka 10) also
    {"x": undefined,
        "comment": "undefined",
        "btype": 10},
    {"x": Number.NaN,
        "comment": "Number.NaN",
        "btype": 1},
    {"x": Infinity,
        "comment": "Infinity",
        "btype": 1},
    {"x": Number.POSITIVE_INFINITY,
        "comment": "Number.POSITIVE_INFINITY",
        "btype": 1},
    {"x": Number.NEGATIVE_INFINITY,
        "comment": "Number.NEGATIVE_INFINITY",
        "btype": 1},
    {"x": Number.MIN_VALUE,
        "comment": "MIN_VALUE",
        "btype": 1},
    {"x": Number.MAX_VALUE,
        "comment": "MAX_VALUE",
        "btype": 1}
];

var Db = MongoDB.Db,
    Server = MongoDB.Server;
var db = new Db('nodeTest', new Server("127.0.0.1", 27017,
    {auto_reconnect: false, poolSize: 4}), {native_parser: false, safe: false});

db.open(function (err, db) {
    "use strict";
    db.dropCollection("types_node", function (err, result) {
        var i,
            printLog = function (err, result) {
                if (err) {
                    console.log(err.toString());
                }
                console.log(result);
            };
        if (err) {
            console.log(err.toString());
        }
        console.log("Drop Collection Result: " + result);
        db.collection("types_node", function (err, collection) {
            for (i = 0; i < typeDocuments.length; i = i + 1) {
                collection.insert(typeDocuments[i], {safe: true}, printLog);
            }
        });
    });
});

JSON Type Extensions

JSON has only a minimal set of data types. What are possible ways to actually extend it and add data types?

Example: ‘Date’

The basic (scalar) types provided by JSON are: ‘null’, ‘true’, ‘false’, ‘string’ and ‘number’ (http://www.json.org). For example, JSON does not have a basic or scalar type for ‘date’.

For the discussion of data type extensions, as date example the date 7/29/2012 is used. One possibility is simply to represent “7/29/2012” as a string and use the ‘string’ type. If this is the case, consumers have to be aware that dates are actually encoded as strings. And, a consumer now has to inspect every string and try to see if it can be interpreted as a date.

Aside from the processing effort, what if the provider has intended to be a string and not a date? What if the consumer has a different interpretation in mind that would require it to be “29-07-2012”? What if a consumer is not aware that dates are encoded as strings?

JSON’s Data Type System is a Closed System

JSON has a fixed set of scalar data types and no extension mechanism to add additional scalar data types. There are two mechanisms to define complex structures: objects and arrays. However, this mechanism does not allow to define new data types. It only supports setting the value of a property to a complex structure (which does not have a type).

So effectively, JSON’s data type system is closed and cannot be extended by new data types.

The Problem: JSON Data Type Extensions

In the absence of an extensible type system, this bears the question: how is a type like ‘date’ encoded? The goal is that the sender and the receiver of a JSON object containing a date actually both interpret date as date and misinterpretations are avoided.

Or, the more general form of the question is: how are types represented that are not natively represented in JSON?

The blog https://realprogrammer.wordpress.com/2012/07/18/json-is-strictly-by-value/ discussed possible problems and solutions to the by-value and by-reference representation. Here, in contrast, the discussion is about data type representation for those types that do not have a direct equivalent in JSON itself.

Possible Approaches of Data Type Encoding in JSON

There are different approaches of implementing the data type ‘date’ in JSON. A few basic approaches are discussed in the following.

  • Naming Convention. In this approach the property name is augmented with additional characters to indicate that its value contains a date. For example {“BirthDay_D”:”7/29/2012″}. All systems that write/read this property must be aware of the ‘_D’ designator and use it properly. In addition, when property names are used for search or display, the ‘_D’ must be stripped off. For database queries, it must be present, of course.
  • Value Formatting Rules. In this approach the type is ‘hidden’ in the value based on formatting rules. For example, dates could be formatted as “<month>/<day>/<year>”. This rule has to be known by all systems that read or write dates. In addition, every time a system reads a string, it has to check if the string is formatted according to one of the potentially many formatting rules it knows.
  • Value Tagging. This is a variation of the value formatting rules. Instead of formatting the value in a specific way, the value is pre-pended with an encoding: {“BirthDay”:”#date#7/29/2012″}. This separates the designation ‘date’ from the particular formatting. There are problems, though, with this approach, too, as the systems accessing this value must parse it properly. In terms of queries, the proper designator has to be added to e.g. search terms coming from the user interface.
  • Objects With Type Designator. One of the cleanest ways is to separate the property name, the value and it’s type. Since these are two pieces of data, an object is required to properly represent it. For example {“BirthDay”:”7/29/2012″, “#type”:”date”}. In this case the type is denoted separately in a property called ‘#type’. The value of this property states the particular type, here ‘date’. This mechanism allows any number of types whereby the type names are property values of ‘#type’. The downside is that properties with non-JSON types are objects. However, this is well supported by JSON libraries and packages and does not require special processing like naming conventions or value formatting/tagging rules.

There are many more intricate variations possible and discussed. Over time a few might crystallize and picked up by so many systems that a convention establishes itself. At that point the problem would be solved for real.

There is also an approach that I would consider a ‘not-so-good idea’. This approach uses a property name as the type designator. For example:

{"date":"7/29/2012"}

This approach has several severe problems. The first problem is that with this approach, an object can only have at most one date. If two dates are necessary, further properties and objects have to be used to encode that case. Second, code accessing the properties of an object has to understand that there is possibly a large set of property names that are ‘reserved’ or have specific meaning, namely, being type names. If 25 types are introduced, there would be 25 property names representing types and any client or consumer of these JSON documents have to be aware of it. And finally, property names are usually carrying some semantics, like a ‘birth date’. If a property name is used as designator, then the fact that a date is a birth date has to be encoded separately, making the structure a lot more complex, in addition to forcing client implementations to be more complex.

Key: Common Understanding

All approaches rely on a common understanding of how to interpret the type extension that is not encoded in JSON itself. So a specification has to be agreed upon by the producer and consumer of the type extensions in order to ensure a common understanding.

The degree of common understanding, however, varies considerably. Formatting rules are a lot harder to specify and comply to compared to a single property name ‘#type’ and an enumeration of possible type names like ‘date’.

Existing Approaches

What do ‘real’ systems actually do? For example, MongoDB has the following approach: http://www.mongodb.org/display/DOCS/Mongo+Extended+JSON. This system provides three different approaches, with only one catering to pure JSON. In this case, unfortunately, they follow the ‘no-so-good idea’ approach by using property names as type designator. The common understanding of the value representation is externally defined by BSON (http://bsonspec.org/).