Executing analytical queries on large amounts of data (colloquially known as “big data”) poses a great challenge for database management systems (DBMS). Conventional methods generally require data for analytical queries to be loaded into operational memory from persistent memory to be processed. With data for a single query execution reaching large scales of terra bytes, the operational memory may not be able to hold the full data source required for the query execution. In such a scenario, the spill over data may extend into slower memory that has higher storage capacity but much slower input/output (I/O) speed. In addition to slower I/O, more complex operations such as a sort or aggregation performed on big data may further exacerbate the draw on the computational resources of the DBMS. Particularly, comparing and re-arranging terra bytes of data spread across slower memory would substantially decrease the performance of the query and affect the user experience.
The challenge is particularly evident with top-N analytical queries. The term “top-N” query refers to analytical queries that select and sort a data source from a DBMS and output only the first N (top-N) number of rows of the sorted data as the result. The DBMS loads the target data source of a top-N query into a buffer, sorts the buffer according to one or more columns and respective orders specified in the top-N query, and selects the top-N rows. The one or more columns that are specified for the sort of a top-N query data source are referred herein as “keys.” When multiple keys are specified, then whichever key is specified first is used for the sorting of rows. The next key is used to sort rows which have the same values for the earlier specified key and so forth.
When the top-N query is executed on a table with big data, all rows from the big data table may be loaded into a buffer. The buffer, which at that point may span over multiple types of memory and may include slower memories like disk memory, may contain millions of rows that have to be sorted. Sorting so many rows consumes enormous amounts of resources of the DBMS and causes over utilization of computational resources, while the use of slower memory introduces substantial latency in the processing.
To handle the challenge, new functionalities have been developed for computer hardware to process big data. For example, today's computing nodes usually utilize multiple multi-core processors, in which each multi-core processor consists of multiple independent processing units to execute instructions in parallel manner. However, the multi-core architecture requires new techniques to fully leverage the inherent parallelism in the hardware.
Furthermore, modern computer architecture includes multiple types of memory having different speed and storage capacity, with higher speed memory having lower storage capacity. For example, data operations on cache memory are magnitudes faster than the data operations on disk memory, while the capacity of the disk memory is generally many magnitudes more than the cache memory. Therefore, it is critical for the new techniques to ensure that the data operations are performed on a smaller data that can fit into higher speed lower storage capacity memory.
Not only smaller data chunks are preferred for the multi-node/core hardware, but also processing that is based on columns rather than individual tuples is desirable. In the traditional row-based pipeline model, in which a whole row or full tuple is read into memory to process, significant computing resources are spent on query plan evaluation rather than on the processing of data itself. Furthermore, the row-based query execution usually incurs a high cache miss rate, and significant interpretation overheads because the tuples generally carry the full row data, which occupies greater memory space and incurs greater cost in extracting the necessary data from the tuple.
With the columnar vectorized execution model, as opposed to row-based execution model, query data source is processed column by column instead of a full row by full row. In such a model, the input to a query operator is a column (or a portion thereof), which is a single dimensional array, also referred herein as a vector. Compared with the traditional tuple-at-a-time model, columnar vectorized execution model significantly improves the query execution performance because many performance-critical optimizations such as loop-unrolling can be applied. Modern CPUs also have specific column-optimized features such as Single Instruction, Multiple Data (SIMD) instructions that optimally use cache hierarchy reducing the interpretation overheads.
However, vectorized execution imposes further challenges on a top-N query execution. For example, after processing one vector for a top-N selection, the result is intermediate because other vectors of the same key are to be processed. Thus, the result needs to be preserved from one vector processing to the next vector processing. The techniques need to insure that the result is small enough to be transferred between computing nodes/cores and be preserved in cache without being pushed out to a lower-level memory.
Another distinct challenge with a top-N query execution in a vectorized execution environment arises when the top-N query specifies multiple keys for sorting to select top-N rows. Because, in the columnar vectorized execution environment, the data is processed in column-wise manner, only a single vector of only a single key is processed at a time. Such processing introduces difficulty in comparing two key values which are equal on the same key. The next key specified by the query has to be used to differentiate them further. However, interrupting the current vector processing to fetch the next key vector incurs prohibitive overheads and thus, should be avoided.