As information becomes more and more connected across multiple systems, it is beneficial to be able to quickly gather data from these multiple systems to support various scenarios, such as responding to a search query. On the other hand, the large amount of data available in such systems can make the task of updating data and maintaining data consistency very difficult. To solve such a problem, data is typically normalized when stored in a data store. Such a normalization process organizes the attributes and relations of a data store to reduce or even eliminate data redundancy in the stored data.
Updating normalized data becomes relative easy because data is stored in one place and the updating only needs to be performed once. Querying the normalized data, however, becomes time consuming because the search typically involves joining multiple tables, which can be very computationally expensive, particularly when there are a large number of such joining steps. For example, a user might submit a query for books written by authors with a certain last name. In a normalized data model, book data might be stored in a table separate from the table storing the author data. The book data might have a reference to its corresponding authors, but the detailed information about the authors, such as their last names, are stored in the author table. To respond to the search query, the book table and the author table have to be joined in order to determine the last names of the authors of each book.
To reduce the search time and the number of operations performed on the data, the normalized data can be denormalized to introduce some redundant information in order to support certain scenarios. In the above example, for instance, the book data can be denormalized to include the last names of authors from the author table beforehand. An index based on authors' last names can then be built for the book table so that when the query is received, the results can be quickly identified by a look up on the index without performing a join operation.
Data denormalization, however, is typically performed manually and often by people other than those who perform data updates. This can cause difficulties in updating data and in maintaining data consistency, the very problem data normalization tries to solve in the first place.
It is with respect to these and other considerations that the disclosure made herein is presented.