Field of the Invention
The present invention relates to in memory database grid (IMDG) utilization and more particularly to the optimization of an IMDG schema.
Description of the Related Art
Database query processing refers to the receipt and execution of data queries against a database. Flat file databases generally process queries in accordance with a key used to locate matching records and to return the matching records to the requestor. To the extent that data is to be culled from different related records, a series of queries are required to locate different keys in different database tables so as to ultimately return the desired set of data. Relational databases improve upon flat file databases by permitting the logical joining together of different tables so as to execute a single query against the joined set of tables in order to produce a desired set of data.
An in memory data grid (IMDG) is a highly distributable form of a database that permits parallel processing across a set of disparately located computing devices. The use of an IMDG permits substantial parallelization of database operations and, in consequence, efficient utilization of unused processing resources in each host computing device supporting the IMDG. To the extent that data in the IMDG is highly distributed, relational database concepts cannot be effectively applied. Thus, though highly scalable, database operations in an IMDG are substantially granular and numerous in comparison to that of a traditional relational database.
A document-oriented database, by comparison, focuses upon the storage, retrieval and management of document-oriented information—namely semi-structured data. Document-oriented databases are a form of a “No-SQL” database that in contrast to relational database technology and the underlying principal of “relationship”, are designed around an abstract notion of a document. In particular, a document-oriented database assumes documents encapsulate and encode data or information in some standard format or encoding. Presently, encodings include XML, YAML, JSON, and BSON, as well as binary forms like PDF and proprietary word processing and spreadsheet formats.
Of note, an IMDG as traditionally implemented provides a set of interconnected virtual machines as a single address space for in-memory data access. The data is thus partitioned amongst the virtual machines to provide scalability according to a partitioning scheme, for instance map/shard placement. The data source of an IMDB typically is a back end relational database; however, with the emergence of “No-SQL” databases, a challenge has emerged to provide integration between the document-oriented database based data model and corresponding data organization and partitioning in the IMDG. In particular, the challenge is not only to load the data into the IMDG from the document-oriented database, but also to load in the IMDG the requisite constructs and to optimize the data partition and distribution to ensure performance and scalability.