Null, Undefined, NaN and Missing Property: Goto Considered Harmful (Part 1)

How to deal with property values ‘null’, ‘undefined’, ‘NaN’ and missing properties?

Example: Math

The following five documents are the running example for this blog.

{"amount" : 25}
{"amount" : null}
{"amount" : undefined}
{"amount" : NaN}
{"balance" : 33}

All documents have the property ‘amount’, except for the last document (it is missing the property ‘amount’).

  • What is the total sum across all five documents for the values of property ‘amount’?
    • 25?
  • What is the median of the values for property ‘amount’?
    • 25?
  • What is the average of values for property ‘amount’?
    • 25 or 5 or 6.25?

Modeling Considerations

One way to interpret a property value of ‘null’, ‘undefined’, ‘NaN’ is that the property is present, but its value is not set. Another alternative interpretation is that the property is ‘absent’, meaning, it should not be considered at all (as if it were not present).

For the sum across the documents, the interpretation does not make a difference; likewise, we can argue that for the median it does not make a difference, either.

However, for computing the average it makes a huge difference, as in this case the number of documents becomes part of the computation, not just the values. So in the example above, the interpretation as absent means that the number of documents to be considered is 1, so the average is 25. If the interpretation means that there is not value, then there are 4 documents, so the average is 6.25.

But what about the document without the property ‘amount’? Is it interpreted as ‘amount’ is absent or no value? For the latter the average would result to 5.

Local Schema and Dynamic Schema Changes

In the presence of a document local schema (i.e., each document can follow its own schema) and dynamic schema changes it is extremely important to agree on the interpretation of the constants ‘null’, ‘NaN’, and ‘undefined’.

Equally important is to agree on the meaning if a property is absent. This becomes super-important if documents actually change their schema over time, meaning, e.g., that up to a point in time a property was left out if there was no value and after that the property is present with a value of ‘null’.

On top, when the number of documents (and/or number of found properties) is part of a computation, the agreement also needs to include how to count documents or properties across documents in collections.

Laying Down The Rules

So how exactly does one establish an agreement? One possibility is to put down the rules in form of engineering rules that engineers have to enforce and make sure that they are implemented. This is based on convention and agreement to follow them.

Alternatively, helper functions can be implemented that implement the rules directly. A countProperty() function could return for a given set of documents how many contain a specific property. A valueOf() function could return the value of a property. It either returns a value, or throws an NoValueFound exception.

These functions can ensure in their implementation that the agreed upon interpretation is implemented. And, if interpretations for specific circumstances are necessary (e.g., array of scalars), then these functions can become polymorphic or variations can be implemented.

Advertisements

One thought on “Null, Undefined, NaN and Missing Property: Goto Considered Harmful (Part 1)

  1. Pingback: Schema-free Database (Part 1): An Oxymoron | r e a l - p r o g r a m m e r

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s