1. Technical Field
The claims herein generally relate to computer database systems, and more specifically relate to inserting data into an in-memory distributed nodal database such as in the memory of a massively parallel super computer.
2. Background Art
Supercomputers and other highly interconnected computers continue to be developed to tackle sophisticated computing jobs. One type of highly interconnected computer system is a massively parallel computer system. A family of such massively parallel computers is being developed by International Business Machines Corporation (IBM) under the name Blue Gene. The Blue Gene/L system is a high density, scalable system in which the current maximum number of compute nodes is 65,536. The Blue Gene/L node consists of a single ASIC (application specific integrated circuit) with 2 CPUs and memory. The full computer is housed in 64 racks or cabinets with 32 node boards in each rack.
Computer systems such as Blue Gene have a large number of nodes, each with its own processor and memory. This characteristic provides the opportunity to provide an in-memory database, where some portions of the database, or the entire database resides completely in-memory. An in-memory database could provide an extremely fast response time for searches or queries of the database. However, an in-memory database poses new challenges for computer databases administrators to load the data into the memory of the nodes to take full advantage of the in-memory database.
The prior application referenced above describes an apparatus and method for pre-loading an in-memory database in a parallel computing system. It describes how a node manager uses empirical evidence gained from monitoring prior query execution times and patterns to determine how to effectively load the in-memory database. The structure of the database is analyzed to determine effective ways to pre-load the database.
Another challenge for an in-memory database is how to cluster data records that may span across multiple nodes of the in-memory database. The database will need to determine where records or parts of records will be stored in the memory of the different nodes.
Without a way to effectively manage record placement in an in-memory database, parallel computer systems will not be able to fully utilize the potential power of an in-memory database.