1. Technical Field
This disclosure generally relates to database query execution and optimization, and more specifically relates to query execution and optimization while utilizing combining network extensions in a parallel computer system of multiple nodes.
2. Background Art
Databases are computerized information storage and retrieval systems. A database system is structured to accept commands to store, retrieve and delete data using, for example, high-level query languages such as the Structured Query Language (SQL). The term “query” denominates a set of commands for retrieving data from a stored database. The query language requires the return of a particular data set in response to a particular query.
Many large institutional computer users are experiencing tremendous growth of their databases. One of the primary means of dealing with large databases is that of distributing the data across multiple partitions in a parallel computer system. The partitions can be logical or physical over which the data is distributed.
Massively parallel computer systems are one type of parallel computer system that have a large number of interconnected compute nodes. A family of such massively parallel computers is being developed by International Business Machines Corporation (IBM) under the name Blue Gene. The Blue Gene/L system is a scalable system in which the current maximum number of compute nodes is 65,536. The Blue Gene/L node consists of a single ASIC (application specific integrated circuit) with 2 CPUs and memory. The full computer is housed in 64 racks or cabinets with 32 node boards in each rack. The Blue Gene/L supercomputer communicates over several communication networks. The compute nodes are arranged into both a logical tree network and a 3-dimensional torus network. The logical tree network connects the computational nodes so that each node communicates with a parent and one or two children. The torus network logically connects the compute nodes in a three-dimensional lattice like structure that allows each compute node to communicate with its closest 6 neighbors in a section of the computer.
Computer systems such as Blue Gene have a large number of nodes, each with its own processor and memory. This characteristic provides the opportunity to provide an in-memory database, where some portions of the database, or the entire database resides completely in-memory. An in-memory database provides an extremely fast response time for searches or queries of the database. In-memory databases pose new challenges and opportunities for computer databases administrators to utilize the full capability of an in-memory database. In particular, a parallel computer system such as Blue Gene has a combining network, which is hardware that is also referred to as the global combining network or collective network. The global combining network connects the nodes in a tree where each node has one or two children. The global combining network has a built-in arithmetic logic unit (ALU) on each node to perform collective operations on data packets as they move along the tree network. Using the ALU of the global combining network to perform some of the query reduces the load on the node CPUs to increase database performance.
Database query optimizers have been developed that evaluate queries and determine how to best execute the queries based on a number of different factors that affect query performance. On parallel computer systems in the prior art, the database and query optimizer are not able to effectively utilize a combining network while executing a database query. Without a way to more effectively execute and optimize queries, multiple network computer systems will continue to suffer from inefficient utilization of system resources to process database queries.