Field of the Invention
The present invention relates to in memory database grid (IMDG) utilization and more particularly to detecting duplicate data in an IMDG.
Description of the Related Art
Database query processing refers to the receipt and execution of data queries against a database. Flat file databases generally process queries in accordance with a key used to locate matching records and to return the matching records to the requestor. To the extent that data is to be culled from different related records, a series of queries are required to located different keys in different database tables so as to ultimately return the desired set of data. Relational databases improve upon flat file databases by permitting the logical joining together of different tables so as to execute a single query against the joined set of tables in order to produce a desired set of data.
An in memory data grid (IMDG) is a highly distributable form of a database that permits parallel processing across a set of disparately located computing devices. The use of an IMDG permits substantial parallelization of database operations and, in consequence, efficient utilization of unused processing resources in each host computing device supporting the IMDG. To the extent that data in the IMDG is highly distributed, relational database concepts cannot be effectively applied. Thus, though highly scalable, database operations in an IMDG are substantially granular and numerous in comparison to that of a traditional relational database.
Abstractly, an IMDG is a distributed object store similar in interface to a typical concurrent hash map. In an IMDG objects are stored with keys and an interface is provided as a simple hash map. In this regard, the fundamental IMDG paradigm is a key-value pair, wherein the grid of the IMDG stores values with an associated key, by which the value is subsequently retrieved. The map itself includes entries of such key-value pairs. Therefore, the map provides a picture of the content of the different nodes of the IMDG.
Given the nature of an IMDG, oftentimes data can be stored in duplicate in different portions of the IMDG. This duplication of data can arise intentionally in consequence of data map duplication. Alternatively, this duplication can arise unintentionally in error. In either case, data duplication in an IMDG can have some adverse consequence. First, memory can be unnecessarily consumed to accommodate duplicate instances of data. Second, data consistency can become compromised where one instance of duplicate data updates and a duplicate instance of the same data does not update. As such, an inefficiency in the IMDG itself can result.