1. Technical Field
This disclosure generally relates to a database on a parallel computer system, and more specifically relates to a method and apparatus for data retrieval with a non-unique key from an in-memory database in a parallel computer system.
2. Background Art
Databases are computerized information storage and retrieval systems. A database system is structured to accept commands to store, retrieve and delete data using, for example, high-level query languages such as the Structured Query Language (SQL). The term “query” denominates a set of commands for retrieving data from a stored database. The query language requires the return of a particular data set in response to a particular query. In a typical database structure, data is contained in a flat file partitioned into records or rows which are further partitioned into fields. A “key” is a string comprised of one or more fields from each record or row that can be used to retrieve information from the database using a query. Keys are generally stored in collated order in an “index” The index is searched for a leading substring of a key and, if that substring is found, the corresponding records/rows are the result. A unique key search is one where the search substring can, because of constraints on the data, only result in a single search result. Thus only one record/row can result. Similarly, a non-unique key is one that returns multiple records.
Databases and database queries are also used in computer systems with a large number of compute nodes. Massively parallel computer systems are one type of parallel computer system that have a large number of interconnected compute nodes. A family of such massively parallel computers is being developed by International Business Machines Corporation (IBM) under the name Blue Gene. The Blue Gene/L system is a scalable system in which the current maximum number of compute nodes is 65,536. The Blue Gene/L node consists of a single ASIC (application specific integrated circuit) with 2 CPUs and memory. The full computer is housed in 64 racks or cabinets with 32 node boards in each rack.
Computer systems such as Blue Gene have a large number of nodes, each with its own processor and memory. This characteristic provides the opportunity to provide an in-memory database, where some portions of the database, or the entire database resides completely in-memory. An in-memory database could provide an extremely fast response time for searches or queries of the database. In-memory databases pose new challenges and opportunities for computer databases administrators to utilize the full capability of an in-memory database. In particular, a parallel computer system such as Blue Gene has hardware that supports a global combining network that connects the nodes in a tree where each node has one or two children. The tree network has a built-in arithmetic logic unit (ALU) to perform reductions of data packets as they move along the network.
The prior art techniques for searching an in-memory database have not taken advantage of the network structures available in parallel computer systems such as Blue Gene. Without a way to effectively search an in-memory database, parallel computer systems will not be able to fully utilize the potential power of an in-memory database in a parallel computer system.