SQL for JSON Rationalization Part 14: Cartesian Product of Asterisk Queries

After introducing projection and restriction in JSON SQL, the next blogs will discuss Cartesian product initially, and joins down the road.

Example Data Sets

Three collections form the example data set for this blog. The collections are cp_one, cp_two and cp_three.

cp_one:

{"a":"a-value","b":"b-value"}

cp_two:

{"a":{"x":true},"c":{"y":false}}
{"a":{"x":null}}

cp_three:

{"d":[],"e":[]}
{"f":[true],"g":[false]}
{"h":[null],"i":[null]}

Recap: JSON Query Result Representation

To recap, a result of a JSON SQL query can be represented in two different forms. One form is a set of JSON documents, and the other form is a set of rows in a relational table.

For example, the JSON SQL query

select {*} from cp_two

results in

{"a":{"x":true},"c":{"y":false}}
{"a":{"x":null}}

The projection syntax indicates through the use of “{” and “}” that the result of the query is to be represented as JSON documents.

In contrast, the JSON SQL query

select * from cp_two

results in

|a_x  |a          |c_y   |c           |
+-----+-----------+------+------------+
|true |{"x":true} |false |{"y":false} |
|null |{"x":null} |<>    |<>          |

Omitting the “{” and “}” in the projection indicates that the result should be in relational table form. Note that the result in a relational table contains a column for each path found in any of the result JSON documents.

Cartesian Product and Join

In short, the Cartesian product is the cross product of the documents of the collections named in the JSON SQL query. A JSON SQL query can reference two or more collections and the result is the cross product of all documents in all referenced collections.

Since the result of a JSON SQL query is a set of JSON documents, the result of a Cartesian product query must be a set of single JSON documents as well. Each JSON document, however, will be a combination of the JSON documents as produced by the Cartesian product.

For example, the Cartesian product of cp_one and cp_two is:

{"a":"a-value","b":"b-value"} {"a":{"x":true},"c":{"y":false}}
{"a":"a-value","b":"b-value"} {"a":{"x":null}}

The result of the query will be two JSON documents, each is the combination of the document pairs just shown.

A join is a Cartesian product with an applied restriction. The restriction can be simple or complex, depending on the client’s requirements. Joins are not the focus in the next few blogs, but will be front and center down the road.

The definition of the Cartesian product (or join for that matter) is fundamentally not different from the relational equivalent. Instead of creating the cross product of rows, the cross product of JSON documents is computed.

Asterisk Cartesian Product Query

For the benefit of the discussion this blog only discusses Asterisk Cartesian product queries that reference two or more collections. An example query is

select {*} from cp_one as one, cp_two as two

Inherent with Asterisk as projection is the possible duplication of paths in the combined JSON documents. For example, the collections cp_one and cp_two both contain documents with a path “a”.

A duplication of paths is not necessarily the case; if the documents of the referenced collections do not have paths in common, there will be no duplication. However, a duplication of paths is possible in general and this possibility needs to be addressed.

The approach to remove duplicate paths is called duplicate path resolution.

Duplicate Paths in Query Results

JSON document combinations that are the result of a Cartesian product might be disjoint in paths, or might have common partial or full paths. If the paths are not disjoint, the combination of the JSON documents might contain the same path twice.

The Cartesian product of cp_one and cp_two is (as shown above):

{"a":"a-value","b":"b-value"} {"a":{"x":true},"c":{"y":false}}
{"a":"a-value","b":"b-value"} {"a":{"x":null}}

Ignoring duplicate paths, the result of the equivalent JSON SQL query could be composed like this:

{"a":"a-value","b":"b-value","a":{"x":true},"c":{"y":false}}
{"a":"a-value","b":"b-value","a":{"x":null}}

This would simply be the combination of all properties into one JSON document. While the JSON standard does not prohibit duplicate properties in JSON documents, many implementations (e.g., languages, libraries, or storage systems) do not support maintaining duplicate properties consistently. Therefore, to be on the safe side, avoiding duplicate paths is prudent.

For reference, relational systems append e.g. “_1” or “_2” to duplicate column names in order to avoid duplication. However, JSON SQL takes a different approach in order to provide symmetry for the JSON result and the relational result case as both cases have to be addressed.

Automatic Duplicate Path Resolution

Since no schema is in place for any of the involved documents or collections, it is impossible to determine based on a schema if duplicate paths will exist (or not). This means that it is always assumed that there could be duplicate paths.

In order to consistently avoid duplicate paths, several steps are taken.

The first step is requiring a correlation specification for each collection referenced in a JSON SQL query referring to more than one collection. For example,

select * from cp_one as one, cp_two as two

specifies “one” as correlation specification for cp_one, and “two” for cp_two.

The second step is that the results are qualified by the correlation specification. For the result as JSON documents the documents from the Cartesian product become sub-documents where correlation specifications are the top level path.

For example,

select {*} from cp_one as one, cp_two as two

results in

{"one":{"a":"a-value","b":"b-value"},
 "two":{"a":{"x":true},"c":{"y":false}}}
{"one":{"a":"a-value","b":"b-value"},
 "two":{"a":{"x":null}}}

As the results show, single documents are returned and are the combination of the corresponding documents coming from the Cartesian product. The correlation specifications are being used as top level paths and so the origin collections of the results become apparent.

The representation of the result in a relational table is analogous: the column names are prepended with the corresponding correlation specifications followed by an underscore “_”.

For example,

select * from cp_one as one, cp_two as two

results in

|one_a     |one_b     |two_a_x |two_a      |two_c_y |two_c       |
+----------+----------+--------+-----------+--------+------------+
|"a-value" |"b-value" |true    |{"x":true} |false   |{"y":false} |
|"a-value" |"b-value" |null    |{"x":null} |<>      |<>          |

Using the correlation specification as top level properties in the JSON document result format or as prefixes in the relational table result format achieves symmetry in avoiding duplication paths.

Role of Correlation Specification

Summarizing, the approach of mandatory correlation specifications combined with their use as prefix or top level properties achieves robust duplicate path resolution that is independent of the specific collections or the documents involved.

  • Since it is unknown if there will be a duplication of paths, queries representing a Cartesian product must have a correlation specification for each of the referenced tables
  • Every projection path is prefixed even if there is no duplication (since without schema it is not possible to know if there is going to be path duplication or not)
  • Selective use of a prefix is impossible due to a path element possibly being equivalent to a correlation specification

Example Cartesian Product Asterisk Queries

Some additional example queries are shown next.

select {*} from cp_two as two, cp_three as three

results in

{"three":{"d":[],"e":[]},
 "two":{"a":{"x":true},"c":{"y":false}}}
{"three":{"f":[true],"g":[false]},
 "two":{"a":{"x":true},"c":{"y":false}}}
{"three":{"h":[null],"i":[null]},
 "two":{"a":{"x":true},"c":{"y":false}}}
{"three":{"d":[],"e":[]},
 "two":{"a":{"x":null}}}
{"three":{"f":[true],"g":[false]},
 "two":{"a":{"x":null}}}
{"three":{"h":[null],"i":[null]},
 "two":{"a":{"x":null}}}

The query

select {*} from cp_one as one, cp_two as two, cp_three as three

results in

{"one":{"a":"a-value","b":"b-value"},
 "three":{"d":[],"e":[]},
 "two":{"a":{"x":true},"c":{"y":false}}}
{"one":{"a":"a-value","b":"b-value"},
 "three":{"f":[true],"g":[false]},
 "two":{"a":{"x":true},"c":{"y":false}}}
{"one":{"a":"a-value","b":"b-value"},
 "three":{"h":[null],"i":[null]},
 "two":{"a":{"x":true},"c":{"y":false}}}
{"one":{"a":"a-value","b":"b-value"},
 "three":{"d":[],"e":[]},
 "two":{"a":{"x":null}}}
{"one":{"a":"a-value","b":"b-value"},
 "three":{"f":[true],"g":[false]},
 "two":{"a":{"x":null}}}
{"one":{"a":"a-value","b":"b-value"},
 "three":{"h":[null],"i":[null]},
 "two":{"a":{"x":null}}}

Summary

Supporting Cartesian Product in JSON SQL is straight forward and follows the same approach and semantics as in Relational SQL. Aspects like duplicate paths are equivalent to duplicate columns – the same strategies for duplication resolution can be applied.

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Oracle 12c – SQL for JSON (Part 3): Basic Joins

Having JSON support in a relational database means that the join operator is available not only for joining relational data or JSON data, but also for the mixed case: joining JSON and relational data. This opens up a whole new world of data modeling and query execution.

Running Example

This running example creates three tables, “demo”, “city” and “city_rel” and a sample data set in each table. The tables “city” and “city_rel” contain the same data set, once in JSON format, and once in relational format.

DROP TABLE demo;
CREATE TABLE demo
(
  id NUMBER,
  person CLOB 
    CONSTRAINT person_ensure_json 
    CHECK (person IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO demo VALUES
( 1, '{ "name": "Bob", "city": "SF"}' );
INSERT INTO demo VALUES
( 2, '{ "name": "Jake", "city": "PA"}' );
INSERT INTO demo VALUES
( 3, '{ "name": "Alice", "city": "NYC"}' );
INSERT INTO demo VALUES
( 4, '{ "name": "Jenn",  "city": {"name": "Tokyo"}}' );
INSERT INTO demo VALUES
( 5, '{ "name": "Jenn",  "city": ["Tokyo"]}' );
INSERT INTO demo VALUES
( 6, '{ "name": "Jenn",  "city": 66}' );
DROP TABLE city;
CREATE TABLE city
(
  id NUMBER,
  city CLOB 
    CONSTRAINT city_ensure_json 
    CHECK (city IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO city VALUES
( 101, '{"city": "SF", "state": "CA", 
  "country": "US"}' );
INSERT INTO city VALUES
( 102, '{"city": "PA", "state": "CA", 
  "country": "US"}' );
INSERT INTO city VALUES
( 103, '{"city": "NYC", "state": "NY", 
  "country": "US"}' );
INSERT INTO city VALUES
( 104, '{"city": {"name": "Tokyo"}, "state": null, 
  "country": "Japan"}' );
INSERT INTO city VALUES
( 105, '{"city": ["Tokyo"], "state": null, 
  "country": "Japan"}' );
INSERT INTO city VALUE
( 106, '{"city": 66, "state": null, 
  "country": "World"}' );
DROP TABLE city_rel;
CREATE TABLE city_rel
(
  id      NUMBER,
  city    VARCHAR(255),
  state   VARCHAR(255),
  country VARCHAR(255));
INSERT INTO city_rel VALUES
( 1001, 'SF', 'CA', 'US' );
INSERT INTO city_rel VALUES
( 1002, 'PA', 'CA', 'US' );
INSERT INTO city_rel VALUES
( 1003, 'NYC', 'NY', 'US' );
INSERT INTO city_rel VALUES
( 1004, '{"name": "Tokyo"}', NULL, 'World' );
INSERT INTO city_rel VALUES
( 1005, '["Tokyo"]', NULL, 'World' );
INSERT INTO city_rel VALUES
( 1006, '66', NULL, 'World' );

JSON Join

The following SQL statement is a simple join between JSON structures on the property “city”:

SELECT *
FROM demo d, city c
WHERE d.person.city = c.city.city;

This SQL statement projects in addition to joining JSON structures:

SELECT d.person, c.city
FROM demo d, city c
WHERE d.person.city = c.city.city;

The following SQL statement extends the projection:

SELECT 
  d.id,
  d.person.name,
  d.person.city,
  c.id,
  c.city.state,
  c.city.country
FROM demo d, city c
WHERE d.person.city = c.city.city;

JSON – Relational Join

 This SQL statement shows the join between JSON and relational data, combined with a projection:

SELECT 
  d.id,
  d.person.name,
  d.person.city,
  c_r.id,
  c_r.city,
  c_r.state,
  c_r.country
FROM demo d, city_rel c_r
WHERE d.person.city = c_r.city;

Significance of Pure and Mixed JSON Joins

As shown, the join operator is applied easily within JSON tables and across JSON and relational tables. When using Oracle 12c there is no restriction anymore when it comes to the join operator in conjunction of JSON documents.

Pure JSON joins are possible in context of Oracle 12c. This means that developers have a choice to model all data in a pure document form (trying to avoid the need for joins by creating sub-collections – which is almost impossible without denormalization), or to consciously model documents in such a way that the document nature is applied where applicable without having to necessarily de-normalize as the join operator is available.

The mixed case between JSON and relational tables goes a lot further as now data can be modeled according to its nature (not all data is exclusively document-oriented or relational) and its access path requirements without compromising either way.

In addition, the mixed case supports the situation where data is already present in the database in relational form and new data is added in JSON form. This means that even if data is available in relational form, additional data does not have to be in relational form, and the most appropriate representation can be chosen (and no separate document-oriented database has to be deployed, btw).

Go SQL!

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Document-oriented NoSQL Databases: How many Joins will you have to implement?

One of the continuously debated items in context of NoSQL databases is the join operation. Let’s listen in a bit:

and there can be many more variations found on the topic of joins on various levels of technical depth.

So, do we need joins in context of NoSQL databases? Do we do joins implemented by NoSQL databases? Are joins outdated concepts that we can live without in context of NoSQL databases? In this blog I try to rationalize the overarching question in principle. Some fact finding first:

(Database) Data Models and Database Management Systems

Data models, like the relational model, the document-model, the hierarchical model, key-value model, graph model, object-oriented model, XML model, etc., are implementations of data structures in a given database management system. Data models define possible data types and their construction rules for more complex types.

For example, the implementation of a relational model might restrict values in tables to be scalar. Another implementation might allow a table as a value, supporting NF2 relations. One system might support the document-model strictly following the JSON model, while others add additional data types in addition to what JSON defines. Some systems do support the notion of references, other so not. Each database implements a data model in any variation it likes to.

Schemata and Database Management Systems

A schema is a particular extension of a domain model, implemented in context of a data model. For example, a domain model might be suppliers, parts and their relationship. This can be implemented in a relational model, a document model or a graph model or any other supported data model.

There is no ‘best’ way of definition a schema. For the same domain, different schemata can be defined depending on the skill of the creator, the knowledge of query access patterns, the amount of restrictions that should be supervised by the database management system and other factors.

For example, in a document model, suppliers, parts and their relationships can be modeled as three separate documents, or in two documents (suppliers and their relationship to parts), or one document – and there are many more variations possible, of course.

Joins and Database Management Systems

Some database management systems implement the join operation in their query interface, some do not. For example, Oracle, MySQL and FoundationDB implement joins, MongoDB, Oracle NoSQL and Aerospike do not. So joins are not necessarily restricted to the relational data model.

Joins and Data Access Paths

With the fact finding under our belt, how many joins will you have to implement? In principle, this is a function of the required data access based on a specific schema. Different schemata of the same domain will require a different number of joins.

Let’s look at a few examples in the supplier – parts domain.

Example 1: No join required

The documents are structured like this:

{"supplier": "superQuality",
 "parts":[
     {"part_name": "part_lowQual"}, 
     {"part_name": "part_hiQual"}]
}

The query: “find the names of all parts for a supplier” does not require a join as the data is already structured so that each supplier contains the set of all parts it supplies.

Example 2: One join required

The documents are structured like this:

{"supplier": "superQuality",
 "parts": [1, 2]
}
{"part_name": "part_lowQual", "part_id": 1}
{"part_name": "part_hiQual", "part_id": 2}

The query: “find the identifiers and names of all parts for a supplier” requires a join as a supplier only has the identifiers of the parts it ships, not their names.

Example 3: Two joins required

The documents are structured like this:

{"supplier": "superQuality", "supplier_id": "S_55"}
{"part_name": "part_lowQual", "part_id": 1}
{"part_name": "part_hiQual", "part_id": 2}
{"part_id": 1, "supplier_id": "S_55"}

The query: “find the identifiers and names of all parts for a supplier” requires two joins, one to find the objects for a supplier that relate the part identifier to the supplier identifier, and a second one to find the corresponding parts.

Analysis of Examples

The examples have shown empirically that the need for joins is not a function of the data model (document-oriented in this case), but a function of the data access, aka, the number of required data relationship traversals in context of a given schema. If the relationship to be traversed matches the way the data is structured as in Example 1, no join is necessary. As soon as the data is structured differently from the required traversal by the query, joins are necessary (Example 2 and 3).

So, as summary, it is fairly easy to avoid joins. If, and only if, you can structure your data (aka, build your schema) in such a way that it conforms structurally to the queries then you can avoid joins completely (Example 1). I am certain that there are special cases out there for which you can accomplish that, but in general, this is not possible. And, even if it is possible in production, as soon as analysts start analyzing the data sets, they will most likely query along different access paths.

Joins at Query Time vs. Joins at Insert/Update/Delete Time

Above examples clarified that joins are a function of the data access paths. Can joins at query time be avoided entirely by creating data access paths in a certain way?

Yes, it is possible, however, it is a basic trade-off between data query and data manipulation time: reducing the computational effort at run-time, and instead increasing it during insert / update / delete operations. In principle, joins at query time can be avoided if for each access path there is an equivalent data structure in place.

Example 4: Schema refactoring

The documents in this example look like:

{"supplier": "superQuality", "supplier_id": "S_55"}
{"part_name": "part_lowQual", "part_id": 1}
{"part_name": "part_hiQual", "part_id": 2}
{"part_id": 1, "supplier_id": "S_55"}
{"shipper": "fastShipper", "shipper_id": "SH_01"}
{"part_id": 2, "shipper_id": "SH_01"}

Supplier supply parts, however, shippers ship not any part, but only specific parts (maybe for safety reasons). There can be several queries against this document set:

  • Find all parts supplied by a supplier with a given name
  • Find all parts shipped by a shipper with a given name
  • Find all suppliers and shippers for a part with a given name

Each of these queries requires at least one join. The documents can be restructured easily to avoid joins altogether:

{"supplier": "superQuality", "supplier_id": "S_55",
 "parts": [
     {"part_name": "part_lowQual", "part_id": 1}
]}
{"shipper": "fastShipper", "shipper_id": "SH_01",
 "parts": [
     {"part_name": "part_hiQual", "part_id": 2}
]}
{"part_name": "part_lowQual", "part_id": 1,
 "suppliers": [
     {"supplier": "superQuality", "supplier_id": "S_55"}
 ], 
 "shippers": []}
{"part_name": "part_hiQual", "part_id": 2,
 "suppliers": [],
 "shippers": [
     {"shipper": "fastShipper", "shipper_id": "SH_01"}
]}

The idea is clear: structure the data in such a way that a query can be satisfied with a simple selection. And, the consequence is clear, too: data is duplicated, possibly many times. Which means that an insert, update or delete has to know all the locations where to modify the data and has to modify the data consistently (and ideally within a single transaction).

As a side note, this is the situation that normalization tries to address by ensuring that each data item is only once in the database.

Of course, data duplication will have an impact on the size requirements of main memory an disk space. While there is a change in algorithm complexity, there is also a change in the storage and memory size requirements.

Pre-Joining Data

Pre-joining data allows to avoid joins at query time at the cost of duplicating data at data management time. Alternatively expressed, the implementation of duplication at management time is the cost of avoiding normalization combined with query-time joins.

Is there a way to quantify the effort? In principle, there are as many duplications necessary as joins are to be avoided. This is a rough estimate as many joins are the same except for selection and/or projection specifications. If all joins are abstracted to their join criteria (omitting projection and selection), then this is roughly the amount of duplication required.

The article written by Sarah Mei clearly shows the trade-off between data duplication and joins: http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/. She clearly describes many of the issues in context of a specific use case.

“Wait a minute, I don’t have joins and it works anyway!”

But, where are the joins? NoSQL databases that do not implement the join operator in their query interface are in use and production.

If not expressed as query, joins are found either in the application system logic or the interface logic, depending on the design. Most likely these are nested-loop joins or hash-based joins (less likely) or a series of selections with the application logic combining the intermediary query results into the final result data set.

And they are not joins on the complete data set either, but usually have some selection criteria. So the application system logic roughly corresponds to the optimized operator tree of a database query sub-system and in all actuality there might be many joins implemented that way throughout the application logic.

The joins are in fact implemented, just not by using a join operator on the database interface, but inside the application logic. This means that the database cannot optimize the execution, plus there are several queries coming from the application logic putting load on the database system.

And this opens up yet another trade-off: data duplication vs. application logic complexity. If the data is structured in such a way that joins are avoided (at the cost of duplication), then the application logic complexity will be reduced also (from algorithms implementing joins to algorithms issuing queries with selections/projections).

Of course, while the application logic complexity is reduced, the data management logic complexity increased as it has to manage duplicate data consistently across the database.

Summary: Are joins required? Yes. Are joins implemented? Yes.

In my mind there is no question that joins are in general needed and actually implemented today, even if the database does not support a join operator directly and even if there are opinions that joins are not needed. I don’t really understand why there is a discussion about this in the first place as the need for a join is a function of the data schema, not the data model.

The fact that a relational database has the capability of joins does not mean you must use it. And the fact that a NoSQL database does not support joins at their query interface does not mean joins are not needed.

At the heart an architecture and engineering decision has to be made (implicitly or explicitly) of how many joins are implemented through data duplication and how many joins are implemented through algorithms in the application logic layer (if there is not join operator available at the database query interface).

It’s that easy.

 

Joins: (Almost) Impossible to Avoid in Document-oriented Databases

There is a lot of ‘chatter’ about the concept and support of joins in document-oriented databases. So what is the underling issue?

Joins in RDBMS

‘Join’ in the relational world is an operation on two relations that relates the tuples in these relations with each other based on some comparison criteria on the tuples’ attributes. For example, the comparison can be ‘R1.a = R2.b’ and so for each tuple from the first relation R1 all tuples from the second relation R2 are retrieved and combined that match the comparison, meaning, the attribute ‘a’ must match the attribute ‘b’. A detailed discussion can be found here: http://en.wikipedia.org/wiki/Join_%28SQL%29.

Joins allow to relate data from different relations and the join operator is supported by a relational database management system. A typical use case is to find all parts that a supplier supplies. And, for a given part, find all its suppliers. The suppliers and parts are usually stored in different relations and the data have an m:n relationship with each other.

Joins across Documents?

So why the chatter, then? If a document-oriented database stores data in different document collections and if the documents need to be related to each other, then a join is in order. The example of suppliers and parts applies here in the exact same way.

Now, if a document-oriented database does not support joins, what to do? Well, in reality the join will be performed in some layer above the database in a programming language. If all suppliers have to be displayed for a given part, then a program that computes this result effectively implements a join; it is not done in the database, though.

Pre-joined data in Documents?

Some optimization is possible. If the access pattern follows an 80-20 rule, then document-oriented databases allow some hard-coded optimization. If in 80% of the cases the suppliers for a part are requested, and only in 20% the opposite, then the designer of the document layout could create for each part document a sub-collection ‘supplier’ that contains the suppliers of this part. In 80% of the cases no join is necessary any more as the suppliers are ‘pre-joined’ with the parts they supply, only in the 20% of the cases a join is necessary.

However, this causes what in the relational world is called anomalies: If a supplier is removed, then all part documents have to be searched for this supplier. Or if a supplier is added, then all those part documents have to be updated that are supplied by this supplier. Updating supplier data also requires to search the part documents. Pre-joining is effectively a specific de-normalization activity for performance reasons.

Does the type of relationship matter?

Are there relationships that by their nature can be pre-joined without penalty? A very specific relationship, the part-of relationship, falls into this category. It is a ‘clean’ approach since the life time of the part-of objects are the exact same as the containing object.

Another relationship that feels as if pre-joining makes sense is the 1:1 relationship where two objects are exclusively related to each other. However, this is not really the case as one object would be a property of the other and that then could be done the other way around, too. So the 80-20 rule case applies here, too.

In reality, however, relationship between data are usually a lot more complex then just part-of relationships. This in turn means that joins will be necessary. The only real exception is if the 80-20 rule is really a 100-0 rule. This would mean that all access are the exact same and no joins are necessary.

Underlying Conceptual Foundation

Conceptually as soon as independent entities (i.e. objects in their own right) are related to each other, and if their relationship is traversed in both directions at some point in time during the execution of the application, a join is necessary and factually taking place.

Pre-joining is the materialization of the traversal in one direction. So two pre-joins, one for each direction, are possible. If the pre-join in both directions takes place, no join has to be performed upon retrieval; however, the join functionality was applied at time of update or insert in order to accomplish the pre-joins.

As soon as pre-joins exist, possible update, insert and delete anomalies have to be carefully taken care of as pre-joins are the equivalent to de-normalization and therefore data redundancy. At insert, update and delete time all redundant copies of the objects have to be found and the appropriate functionality applied.

Pre-joins are for read-performance reasons only; they are not a conceptual matter and in fact cause additional work at insert, update or delete time instead; so the computational work shifted, but is not avoided.

Note

‘Join’ is a database operator. The same functionality can be implemented in application code outside or ‘on top’ of the database. Most likely the method or function is not called ‘join’ even though it in fact implements that functionality. So be aware of the situation that a document-oriented database does not implement a join and the engineers claim not to need one. The functionality of a join might just be there under a different name.