Relational database systems support the normalization of relational schemas; what about document-oriented databases?
Normalization is a data/schema modeling activity that (if fully applied) results in a schema that has no redundancies and the number of dependencies is minimal (see http://en.wikipedia.org/wiki/Database_normalization for a brief discussion).
In very abstract terms, any change in data is only done in one relation and through means of relationships the rest of the schema stays consistent. This means that in order to support a well-normalized schema queries will contain joins to combine data that is spread out over several tables due to the normalization process.
Given the concept of normalization in relational systems: is is possible or necessary to normalize documents, too?
First of all, it is possible to normalize data that is organized in documents. The naive approach is to build up the documents so that only scalar properties are stored, roughly equating to tables. In addition, in order to establish references between documents, either foreign-key type properties or documents denoting relationships between documents are created. The process of normalization can be followed like in a relational system.
However, in general document-oriented systems lack the join query operator and so combining documents when querying is not possible. Therefore the combination of data has to be done in the application layer (usually resulting in several queries). This is definitely a down-side of trying to normalize data in document-oriented systems.
Furthermore, the idea behind using document-oriented databases is for storing data that ‘naturally’ consists of hierarchical structures that are fetched together (the 80% case) as complete hierarchies and not normalized chunks of data. And example is a blog and its comments. Comments belong to the blog and without the blog they do not make a lot of sense. In general, when a blog is queried, its comments often are required, too, and so having the blog and its comments in one document makes a lot of sense. The blog can be retrieved in one query.
However, there are cases where separating data into different documents might make sense also. For example, a company might build an e-commerce web site and provides a shopping cart. Users add items to shopping carts. A shopping cart might be represented as one document. In this case it does not make a lot of sense to store the complete item description in every shopping cart that has that item in it. It might be better to have the item name and identifier in the cart with a reference to the item description residing in a different document.
While in the relational world it is generally advised to normalize a schema, in the document-oriented world the separation is probably driven by the 80/20 rule of data access. Meaning, to structure documents in such a way that 80% of the queries are fast and fetch all required data in one round trip. Over time, more elaborate rules might be established.
In context of MongoDB a very interesting initial discussion can be found in ’50 Tips and Tricks for MongoDB Developers’ (http://shop.oreilly.com/product/0636920019893.do). This book has a few more elaborated rules and it clearly hints that normalization is for sure an important aspect of document-oriented databases.