Field of the Invention
Embodiments of the invention relate to database storage. More specifically, embodiments of the invention are directed to multi-nodal compression techniques for an in-memory database.
Description of the Related Art
Powerful computers may be designed as highly parallel systems where the processing activity of hundreds, if not thousands, of processors (CPUs) are coordinated to perform computing tasks. These systems are highly useful for a broad variety of applications including, financial modeling, hydrodynamics, quantum chemistry, astronomy, weather modeling and prediction, geological modeling, prime number factoring, image processing (e.g., computer-generated imagery animations, and renderings), to name but a few examples.
One family of parallel computing systems has been (and continues to be) developed by International Business Machines (IBM) under the name Blue Gene®. The Blue Gene®/L architecture provides a scalable, parallel computer that may be configured with a maximum of 65,536 (216) compute nodes. Each compute node includes a single application specific integrated circuit (ASIC) with 2 CPU's and memory. The Blue Gene®/L architecture has been successful and on Oct. 27, 2005, IBM announced that a Blue Gene®/L system had reached an operational speed of 280.6 teraflops (280.6 trillion floating-point operations per second), making it the fastest computer in the world at that time. Further, as of June 2005, Blue Gene®/L installations at various sites world-wide were among five out of the ten top most powerful computers in the world.
IBM is currently developing a successor to the Blue Gene®/L system, named Blue Gene®/P. Blue Gene®/P is expected to be the first computer system to operate at a sustained 1 petaflops (1 quadrillion floating-point operations per second). Like the Blue Gene®/L system, the Blue Gene®/P system is scalable allowing for configurations that include different numbers of racks.
In addition to the Blue Gene® architecture developed by IBM, other highly parallel computer systems have been (and are being) developed. For example, a Beowulf cluster may be built from a collection of commodity off-the-shelf personal computers. In a Beowulf cluster, individual systems are connected using local area network technology (e.g., Ethernet) and system software is used to execute programs written for parallel processing on the cluster of individual systems. Another approach to parallel computing includes large distributed or grid-type computing systems which pool the computing power of hardware spread over widely spread locations.
Often, a database may be too large to store in the memory of a single node of a parallel system. In such a case, the database needs to be mapped to multiple nodes of the parallel system. The amount of physical memory available limits how large a portion of an in-memory database may be stored on a given compute node. One approach to increasing this limit is to compress portions of data from the in-memory database on a given compute node, thereby increasing the total amount of data stored on that node. However, this approach incurs certain costs; namely, decompressing the data on a node takes time. At the same time, storing data on multiple nodes may also result in increased processing costs. For example, an application running on a parallel system may only require data access to a subset of data from the in-memory database. In such a case, having that data span many nodes may also affect performance, as crossing logical or physical boundaries present in the architecture of a particular parallel computing system may increase processing time due to overhead of input-output (I/O) and network communications, etc.
Accordingly, there remains a need in the art for multi nodal compression techniques for an in-memory database.