JSON Type Extensions

JSON has only a minimal set of data types. What are possible ways to actually extend it and add data types?

Example: ‘Date’

The basic (scalar) types provided by JSON are: ‘null’, ‘true’, ‘false’, ‘string’ and ‘number’ (http://www.json.org). For example, JSON does not have a basic or scalar type for ‘date’.

For the discussion of data type extensions, as date example the date 7/29/2012 is used. One possibility is simply to represent “7/29/2012” as a string and use the ‘string’ type. If this is the case, consumers have to be aware that dates are actually encoded as strings. And, a consumer now has to inspect every string and try to see if it can be interpreted as a date.

Aside from the processing effort, what if the provider has intended to be a string and not a date? What if the consumer has a different interpretation in mind that would require it to be “29-07-2012”? What if a consumer is not aware that dates are encoded as strings?

JSON’s Data Type System is a Closed System

JSON has a fixed set of scalar data types and no extension mechanism to add additional scalar data types. There are two mechanisms to define complex structures: objects and arrays. However, this mechanism does not allow to define new data types. It only supports setting the value of a property to a complex structure (which does not have a type).

So effectively, JSON’s data type system is closed and cannot be extended by new data types.

The Problem: JSON Data Type Extensions

In the absence of an extensible type system, this bears the question: how is a type like ‘date’ encoded? The goal is that the sender and the receiver of a JSON object containing a date actually both interpret date as date and misinterpretations are avoided.

Or, the more general form of the question is: how are types represented that are not natively represented in JSON?

The blog https://realprogrammer.wordpress.com/2012/07/18/json-is-strictly-by-value/ discussed possible problems and solutions to the by-value and by-reference representation. Here, in contrast, the discussion is about data type representation for those types that do not have a direct equivalent in JSON itself.

Possible Approaches of Data Type Encoding in JSON

There are different approaches of implementing the data type ‘date’ in JSON. A few basic approaches are discussed in the following.

  • Naming Convention. In this approach the property name is augmented with additional characters to indicate that its value contains a date. For example {“BirthDay_D”:”7/29/2012″}. All systems that write/read this property must be aware of the ‘_D’ designator and use it properly. In addition, when property names are used for search or display, the ‘_D’ must be stripped off. For database queries, it must be present, of course.
  • Value Formatting Rules. In this approach the type is ‘hidden’ in the value based on formatting rules. For example, dates could be formatted as “<month>/<day>/<year>”. This rule has to be known by all systems that read or write dates. In addition, every time a system reads a string, it has to check if the string is formatted according to one of the potentially many formatting rules it knows.
  • Value Tagging. This is a variation of the value formatting rules. Instead of formatting the value in a specific way, the value is pre-pended with an encoding: {“BirthDay”:”#date#7/29/2012″}. This separates the designation ‘date’ from the particular formatting. There are problems, though, with this approach, too, as the systems accessing this value must parse it properly. In terms of queries, the proper designator has to be added to e.g. search terms coming from the user interface.
  • Objects With Type Designator. One of the cleanest ways is to separate the property name, the value and it’s type. Since these are two pieces of data, an object is required to properly represent it. For example {“BirthDay”:”7/29/2012″, “#type”:”date”}. In this case the type is denoted separately in a property called ‘#type’. The value of this property states the particular type, here ‘date’. This mechanism allows any number of types whereby the type names are property values of ‘#type’. The downside is that properties with non-JSON types are objects. However, this is well supported by JSON libraries and packages and does not require special processing like naming conventions or value formatting/tagging rules.

There are many more intricate variations possible and discussed. Over time a few might crystallize and picked up by so many systems that a convention establishes itself. At that point the problem would be solved for real.

There is also an approach that I would consider a ‘not-so-good idea’. This approach uses a property name as the type designator. For example:

{"date":"7/29/2012"}

This approach has several severe problems. The first problem is that with this approach, an object can only have at most one date. If two dates are necessary, further properties and objects have to be used to encode that case. Second, code accessing the properties of an object has to understand that there is possibly a large set of property names that are ‘reserved’ or have specific meaning, namely, being type names. If 25 types are introduced, there would be 25 property names representing types and any client or consumer of these JSON documents have to be aware of it. And finally, property names are usually carrying some semantics, like a ‘birth date’. If a property name is used as designator, then the fact that a date is a birth date has to be encoded separately, making the structure a lot more complex, in addition to forcing client implementations to be more complex.

Key: Common Understanding

All approaches rely on a common understanding of how to interpret the type extension that is not encoded in JSON itself. So a specification has to be agreed upon by the producer and consumer of the type extensions in order to ensure a common understanding.

The degree of common understanding, however, varies considerably. Formatting rules are a lot harder to specify and comply to compared to a single property name ‘#type’ and an enumeration of possible type names like ‘date’.

Existing Approaches

What do ‘real’ systems actually do? For example, MongoDB has the following approach: http://www.mongodb.org/display/DOCS/Mongo+Extended+JSON. This system provides three different approaches, with only one catering to pure JSON. In this case, unfortunately, they follow the ‘no-so-good idea’ approach by using property names as type designator. The common understanding of the value representation is externally defined by BSON (http://bsonspec.org/).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s