MongoDB: … $where a < b

$where allows to expand the expressiveness of MongoDB queries giving programmers an important choice when querying MongoDB.

Use Case

The use case for this blog is to find all documents where the value of property “b” is larger than the value of property “a”. The data set is

> use where
switched to db where
> db.whereColl.save({"a":3, "b":5})
> db.whereColl.save({"a":4, "b":5})
> db.whereColl.save({"a":5, "b":5})
>

Let’s try a simple query first (real values replaced by “…” used to save space):

> db.whereColl.find({"b": {$gt : 4}})
{ "_id" : ObjectId("..."), "a" : 3, "b" : 5 }
{ "_id" : ObjectId("..."), "a" : 4, "b" : 5 }
{ "_id" : ObjectId("..."), "a" : 5, "b" : 5 }
>

This query works quite nicely when asking for all document where “b” is greater than 4. Let’s try to now run the query we set out from:

> db.whereColl.find({"b": {$gt : "a"}})
>

This query does not return a result. Most likely because property “b” contains numbers and “a” is a string constant. Let’s try a different approach:

> db.whereColl.find({"b": {$gt : a}})
Tue Dec 14 17:59:33 ReferenceError: a is not defined (shell):1
>

This query does not return the desired result, either. The reason is that “a” as a variable is not defined.

$where

Enter $where. $where gives the ability to ask the database to execute a Javascript function (or expression) during query processing. Let’s try this query:

> db.whereColl.find({$where: "this.b > this.a"})
{ "_id" : ObjectId("..."), "a" : 3, "b" : 5 }
{ "_id" : ObjectId("..."), "a" : 4, "b" : 5 }
>

This query works. “this” points to the current document at evaluation time and supports the access to properties in the document. This example shows that $where allows expressions to be evaluated by the database that cannot be formulated using the regular query approach of MongoDB.

Discussion

Leaving expressiveness aside, the MongoDB documentation gives variations on the $where as well as performance hints: http://www.mongodb.org/display/DOCS/Server-side+Code+Execution#Server-sideCodeExecution-{{%24where}}ClausesandFunctionsinQueries.

With the strong warning for the use of $where, when does it make sense to use? First, performance measurement with and without the use of $where is a good strategy because if the database does not execute the condition in $where, the application code has to do it. There might be cases where the database is faster.

From an application design perspective, it is always possible to use $where in order to get the queries correct. And later, optimize by removing the $where and moving the expression into the application programs if the $where turns out to be slower. So there is an advantage from a design perspective for sure.

Null, Undefined, NaN and Missing Property: Goto Considered Harmful (Part 2)

Wait a minute: ‘undefined’ and ‘NaN’ is not legal JSON! Part 1 of this blog must have made a mistake?

MongoDB, JSON and Javascript

The MongoDB shell operates on Javascript data types, not JSON data types. Therefore, this is legal and working (MongoDB 2.2.0):

> use blog
switched to db blog
> db.blogColl.save({"amount":25})
> db.blogColl.save({"amount":null})
> db.blogColl.save({"amount":undefined})
> db.blogColl.save({"amount":NaN})
> db.blogColl.save({"balance":33})
>

So, using Javascript constants like ‘null’, ‘undefined’ and ‘NaN’ is possible and legal in context of the MongoDB shell.

MongoDB: Transformation of Input

As discussed elsewhere, MongoDB does transform data types and values at the point of insertion. When retrieving the above, the following result is displayed:

> db.blogColl.find()
{ "_id" : ObjectId("50955b2eeb749e2e4a36d779"), "amount" : 25 }
{ "_id" : ObjectId("50955b38eb749e2e4a36d77a"), "amount" : null }
{ "_id" : ObjectId("50955b3feb749e2e4a36d77b"), "amount" : null }
{ "_id" : ObjectId("50955b47eb749e2e4a36d77c"), "amount" : NaN }
{ "_id" : ObjectId("50955b4eeb749e2e4a36d77d"), "balance" : 33 }

MongoDB decided to honor the Javascript ‘null’ and ‘NaN’, but ‘undefined’ is changed to ‘null’. From a data modeling perspective this is significant as there is no difference in meaning between ‘undefined’ and ‘null’ in the context of MongoDB. An application having a distinct interpretation of those values will cause errors due to the distinction in interpretation it makes, but not MongoDB.

MongoDB: Aggregation Results

Coming back to Part 1 now, the question is, how would the default aggregation operators of MongoDB interpret the above data set in MongoDB? The following is an aggregation framework query:

db.blogColl.aggregate(
    { $group : {
        _id : 1,
        noDocs: { $sum : 1},
        totalamount: { $sum : "$amount" },
        averageamount: { $avg : "$amount" },
        minamount: { $min : "$amount"},
        maxamount: { $max : "$amount"},
        totalbalance: { $sum : "$balance"},
        averagebalance: { $avg : "$balance" },
        minbalance: { $min : "$balance"},
        maxbalance: { $max : "$balance"}
    }}
);

The ‘group’ pipeline step ensures that all documents are seen as one group for the purpose of aggregation. The aggregation operators are applied to all documents in that group. In this example, to all documents.

The documents have either ‘amount’ or ‘balance’ as properties and for both some aggregate values are computed. The outcome when executing the above is:

{
        "result" : [
                {
                        "_id" : 1,
                        "noDocs" : 5,
                        "totalamount" : NaN,
                        "averageamount" : NaN,
                        "maxamount" : 25,
                        "totalbalance" : 33,
                        "averagebalance" : 33,
                        "maxbalance" : 33
                }
        ],
        "ok" : 1
}

There are a few things to call out:

  • The minimum aggregation operator was not successful and the result was left out (the reason is unclear to me).
  • The maximum operator ignored ‘null’ and ‘NaN’ and produced the correct values.
  • The sum and average operators do not ignore ‘null’ and ‘NaN’; so if a non-number is present, the operators indicate that by stating that there was at least one entry that is not a number.

Implications to Modeling Considerations

Part 1 of this blog highlighted that fact that there needs to be a common understanding on dynamic schemas as well as the interpretation of property values, especially constants and (aggregation) operations on those.

Unless your interpretation precisely matches the interpretation of MongoDB, you will have to implement those operators and their interpretation outside MongoDB or in a different way with the capabilities that MongoDB provides.

The above example shows that this is an important design discussion that must take place before system construction in order to avoid bigger problems down the road.