Executing queries on large amounts of data (colloquially known as “big data”) poses a great challenge for database management systems (DBMS). Conventional methods generally require data for queries to be loaded into operational memory from persistent memory to be processed. With data for a single query execution reaching large scales of terra bytes, the operational memory may not be able to hold the full data set required for the query execution. In such a scenario, the spill over data may extend into slower memory that has higher storage capacity but much slower input/output (I/O) speed. Processing operations by shuttling data to and from slower memory would substantially decrease the performance of the query and affect the user experience.
For example, when a query is executed on a table with big data, all rows from the big data table are loaded into a buffer. As a result, the buffer may contain billions of rows and may span over multiple types of memory, which most likely would include slower memories like disk memory. The rows in the buffer may be operated on, for example joined or sorted. Performing such operations on so many rows involving random accesses of slower memory consumes enormous amounts of computing resources of the DBMS, while the use of slower memory in its turn introduces substantial latency in the processing.
To handle the challenge of big data, new architectures have been developed for computer hardware to process big data in parallel. For example, today's computing nodes usually utilize multiple multi-core processors, in which each multi-core processor consists of multiple independent processing units to execute instructions in parallel manner. Further, multiple computing nodes, each containing such processing units, can be networked into a cluster of computing nodes, each node, in addition to processing units, having a modest amount of memory and non-persistent storage for storing table data accessed by query processing. A cluster of computing nodes can have a very large number, in fact, thousands of nodes. The total memory and processing power of the large number of nodes of a cluster provides advantage over conventional systems, particularly when nodes perform operations for query processing in parallel. Such cluster of computing nodes may be used for a DBMS and is referred herein as “cDBMS.”
Although, cDBMS provides the capability of great parallelization in query processing, a computing node of a cluster is still limited by the amount of fast access memory within the node. Indeed, each node has access to multiple types of memory having different speeds and storage capacity, with higher speed memory having lower storage capacity. For example, data operations on cache memory are magnitudes faster than the data operations on disk memory, while the capacity of the disk memory is generally many magnitudes more than the cache memory. Therefore, it is critical for a computing node to ensure that the data operations are performed on a smaller data that can fit into higher speed lower storage capacity memory. Accordingly, the big data database objects or portions thereof have to be distributed not only for parallel processing but also to minimize access to slower memory in the cDBMS.
However, the operation to distribute the big data among the nodes and processing units of nodes itself consumes great amount of computing resources. In many cases, the partition operation (as referred herein) is the costliest operation in a query execution plan and may even defeat the gains in resource utilization gained by parallelizing the execution of queries and minimizing I/O for the big data.
For example, using the range distribution for partitioning may introduce additional computational steps for each tuple in the data to be partitioned. Not only does the cDBMS have to compute the ranges based on which the data is to be partitioned, but for each tuple to distribute, the cDBMS has to calculate to what range the tuple belongs. Furthermore, the cDBMS has to have prior knowledge of the value distribution in a partitioning key, a column in data set of tuples based on which the tuples are to be distributed. Only with such knowledge can the cDBMS evenly distribute tuples among processing units.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.