1. Technical Field
The disclosure and claims herein generally relate to multi-node computer systems, and more specifically relate to distribution of join operations on a multi-node computer system to optimize the efficiency of the system.
2. Background Art
Supercomputers and other multi-node computer systems continue to be developed to tackle sophisticated computing jobs. One type of multi-node computer system is a massively parallel computer system. A family of such massively parallel computers is being developed by International Business Machines Corporation (IBM) under the name Blue Gene. The Blue Gene/L system is a high density, scalable system in which the current maximum number of compute nodes is 65,536. The Blue Gene/L node consists of a single ASIC (application specific integrated circuit) with 2 CPUs and memory. The full computer is housed in 64 racks or cabinets with 32 node boards in each rack.
Computer systems such as Blue Gene have a large number of nodes, each with its own processor and local memory. The nodes are connected with several communication networks. One communication network connects the nodes in a logical tree network. In the logical tree network, the Nodes are connected to an input-output (I/O) node at the top of the tree. The nodes are also connected with a three-dimensional torus network.
Multi-node computer systems with nodes that each have a processor and memory allow the system to provide an in memory database. For an in memory database, portions of the database, or the entire database resides completely in memory. An in memory database provides an extremely fast response time for searches or queries of the database. However, an in memory database poses new challenges for computer database administrators to determine where to load computing processes to efficiently take full advantage of the in memory database. A query of an in memory database in a multi-nodal environment will often involve many nodes. Having the data split apart on many nodes complicates where to execute a join operation.
Without a way to effectively determine where to execute query joins, multi-node, parallel computer systems will not be able to fully utilize the potential power of an in memory database.