Document Projection (Part 2): Definition

What does projection in context of JSON structures or documents actually mean? What should the definition of “projection” be? There are several possibilities discussed next.

Document Projection: Complete Branch

Projection in JSON is projecting a branch of the JSON data structure tree, not projecting a single object or value. To request a projection, a property (projection) path in dot notation could be used (and actually is in many systems). The result of a projection is a valid JSON document containing the specified branch(es).

An example document is

{"a": {"b": {"c": 3, "d": 4, "e": 5}}}

Let’s go through a series of projections in the following.

  • Projection path: “a.b.c”
  • Result: {“a”: {“b”: {“c”: 3}}}
  • Projection path: “a.b”
  • Result: {“a”: {“b”: {“c”: 3, “d”: 4, “e”: 5}}}
  • Projection path: “a.e”
  • Result: {}

The result contains the full path of the projection (or more, but not less). If the requested projection path does not exist, the result is the empty document as none of its properties matches the projection path. The empty projection path “” is not defined, meaning, a projection must name at least one property, and that will be a top-level property in a document.

Several branches can be projected concurrently.

  • Projection paths: “a.b.c”, “a.b.d”
  • Result: {“a”: {“b”: {“c”: 3, “d”: 4}}}

The resulting document contains the combination of all of the branches that result in a valid projection. Redundant projection path specification is possible if one of the projection paths is a sub-path of another one. However, the result document is the same if redundancy is present or absent.

Document Projection: Partial Branch

It might be possible that the whole projection path does not exist, but a part of it. In this case it is a possibility to add the existing result up to that point (MongoDB follows this approach). This results in partial paths whereby the value of their last property is the empty document.

For example, “a.b.f” would result in {“a”: {“b”: {}}}. “a” and “b” exist in the example document, “f”, however, does not.

In my opinion, while possibly useful in some cases, I would not make this the default or standard definition as a result is returned that is incomplete and I could argue that it is in fact incorrect since the value of “b” is not the empty document (I could envision a configuration setting that provides these partial branches if needed).

Document Projection: Value

Wait a minute, why does the result document have to consist of full paths?

The reason is based on the implicit restriction on JSON documents that there can be only one property of a given name on the same level in a document. “Implicit” because the JSON definition (http://json.org/) does not mandate the restriction, but many implementations do: property names on the same level of embedding have to be unique.

For example:

{"x": {"b": {"c": 3, "d": 4}}, 
 "y": {"e": {"c": 3, "d": 4}}}

is a perfectly valid document where the property names are unique on every level. So let’s get back to projection and let’s for a moment assume that projection actually returns the value at the end of the path, omitting the path to the property value itself. So,

  • Projection path: “x.b.c”
  • Result: {“c”: 3}

So far so good.

  • Projection paths: “x.b.c”, “y.e.c”
  • Result: ?

What should the result be? The basic assumption is that a projection on a document returns a document. But “x.b.c” and “y.e.c” both return {“c”: 3} as separate documents, but not one document.

  • One possible result could be an array with two documents. But arrays are in general not considered valid top level documents (again, the JSON definition would allow that).
  • To mitigate that, the array could be the value of a property: {“result”: [{“c”: 3}, {“c”: 3}]}. But this would conflict with a document that happens to have a property “result” of type array with two same documents in it.
  • Or, the two documents would have to be embedded in a third one with special names, like {“1”: {“c”: 3}, “2”: {“c”: 3}}. But then, the original document does not have the properties “1” or “2”.

Based on this discussion having projection results being full paths is simpler and direct.

Projection – Result Correspondence Principle

There is also another argument from the user’s viewpoint. If a user wants to project “x.b.c”, then the user might want to access the result document after the query returns with “x.b.c” as the access path. From this viewpoint, the path in the query and the path in the result document should match and not require access path transformation.

Array Projection: Complete Access Path

Documents may contain arrays as well as array of arrays, arrays of objects of arrays, etc., in principle any of these composite types can be on any level of the document. Projection therefore has to be defined on arrays also, not just documents.

The principle of project paths is extended to include array index specification. For example, let’s consider this document:

{"a": [{"a1": 1}, {"a2": 2}], 
 "b": {"c": [{"c1": 3}, {"c2": 4}, {"c3": 5}]}, 
 "d": [6, 7]}

Let’s do a few projections (arrays are 0-index based):

  • Projection path: a[0]
  • Result: {“a”: [{“a1”: 1}]}
  • Projection path: b.c[1]
  • Result: {“b”: {“c”: [“c2”: 4]}}
  • Projection paths: a[1], b.c[2].c3
  • Result: {“a”: [{“a2”: 2}], “b”: {“c”: [{“c3”: 5}]}}
  • Projection path: a[7]
  • Result: {}

Like in the case of documents, full paths are requested and full paths are returned, with several paths possible. A projection path referring to a non-existing property will not contribute to the result.

So far, so good, except that the results do not yet conform to the “Projection – Result Correspondence” principle from above: the projection “a[1]” resulted in a correct document, but that result document cannot be accessed with “a[1]” to obtain the value.

Array Projection: Padding

In order to support the “Projection – Result Correspondence” principle array results can be padded with the value “null”. For example:

  • Projection paths: a[1], b.c[2].c3
  • Result: {“a”: [null, {“a2”: 2}], “b”: {“c”: [null, null, {“c3”: 5}]}}

Now it is possible to access the result with “a[1]” or “b.c[2].c3” in order to obtain the proper results. From a user’s perspective this is great as again the paths used to specify the projection can be used to retrieve the values.

Array Projection: Scalar Values

Scalar values in arrays do not pose a specific problem:

  • Projection paths: a[1], d[1], d[2]
  • Result: {“a”: [null, {“a2”: 2}], “d”: [null, 7]}

And their access can be accomplished using the projection paths.

Summary

Initially I thought projection is a straight forward function and not worth a discussion in context of document-oriented databases; but then it turned out to be not that clear cut. Nevertheless, the above is a starting point for a strict rationalization of projection in document-oriented databases based on the JSON data model.