Database servers that execute on multi-core processors perform data manipulation operations on large amounts of tabular data. Tabular data is data that is logically organized as rows and one or more columns, each column having a certain size, each row including each column. Logically, tabular data resides in a table-like structure, such as a spreadsheet or relational table. However, the actual physical storage of the tabular data may take a variety of forms. For example, in row-major format, tabular data may be stored as rows that are stored contiguously within a memory address space, each row including each column and a given column occupying the same number of bytes within a row. In column-major format, each column may be separately stored from other columns as a column stored contiguously within a memory address. Unless otherwise indicated, the term column refers to a column stored in column major format, in one or more column vectors.
To perform data manipulation operations on tabular data efficiently, tabular data is moved from main memory to a memory closer to a core processor, where the operations can be performed more efficiently by the core processor. Thus, the movement of tabular data between the memory closer to a core processor and main memory is the type of operation that is performed frequently by database servers.
However, approaches for moving tabular data to a memory closer to the core processor add overhead that significantly offset or eliminate any advantage gained by the movement of tabular data to the memory closer to the core processor. Even direct memory access (DMA) engines capable of offloading the task of moving data cannot offer sufficient increase in processor efficiency for several reasons. Tabular data processed by database operations is not organized or formatted in a way that is optimal for a DMA engine to move.
Additionally, the memory closer to the core processor is typically small in size. Therefore, a DMA engine will be able to move only a small portion of data into the local memory before that memory is full and needs to be emptied before it can be written to again. This results in the DMA engine repeating the process multiple times and issuing an interrupt each time the DMA moves data to the memory closer to the core processor, resulting in a large number of interrupts. A large number of interrupts deteriorate core processor performance because every time the core processor is interrupted, the core processor must determine the source of the interrupt and how to handle the interrupt.
Database Tuple-Encoding-Aware Direct Memory Access Engine For Scratchpad-Enabled Multi-Core Processors describes a hardware accelerated data movement system that is on a chip and that efficiently moves tabular data to multiple core processors. To perform data manipulation operations on tabular data efficiently, the data manipulation operations are performed in-flight while moving tabular data to the core processors. The data movement system includes multiple data movement engines, each dedicated to moving and transforming tabular data from main memory to a subset of the core processors. Each data movement engine is coupled to an internal memory that stores data/control structures (e.g. a bit vector) that dictate how data manipulation operations are performed on tabular data moved from a main memory to the memories of a core processor. The internal memory of each data movement engine may be private to the data movement engine.
There are scenarios where a copy of the same data/control structure can be used by multiple data movement engines. Under such scenarios, a copy of the data is needed in the internal memory of each data movement engine. A copy of the data can be moved from main memory via a DMA engine to the internal memory of each data movement engine. To avoid multiple movements of the copies from main memory to the multiple internal memories and thereby improve efficiency of copying data, techniques are described herein for internally copying data between internal memories within a data movement system.
There are also scenarios where a copy of the same data is transferred from main memory to the memories of multiple core processors. If the multiple cores are served by different data movement engines, each copy may have to be transferred in separate data movements, one for each data movement engine, each data movement entailing a transfer from main memory via a DMA engine. To avoid such multiple data movements and improve efficiency of transferring data to memories of multiple core processors, described herein are techniques for a data movement engine to internally broadcast data to other data movement engines, which then transfer the data to the respective core processors.
Partitioning
Certain operations performed by database servers that execute on multi-core processors, such as joins, aggregations and sorts, frequently need to partition tabular data across computing nodes. The cost of performing such partitioning is a significant proportion of the overall execution time of a query. As a result, performing the partitioning of data efficiently is a key for achieving high performance and scalability in distributed query processing. Described herein are hardware accelerated approaches for achieving such high performance and scalability.
Altering Row Alignment
Columns storing rows can be row aligned. When rows stored in a set of columns are row aligned, the same row is stored in the same relative position or index in each column of the set of columns.
Row alignment enables row resolution. Row resolution refers to the operation of identifying, for a row in a column, at which index or relative position in another column the row resides. For example, a set of rows are stored in multiple columns, which are row aligned. For a particular row stored at the third index or position within a column, row resolution involves recognizing the index or position of the element in the other columns for which the same row is also the third.
Various data manipulation operations, such as a partition operation, manipulate a “source column” to generate one or more “resultant columns”. A resultant column may not be row aligned with the source column. Thus, row alignment by itself cannot be relied upon to perform row resolution between the source column and any resultant column.
To illustrate, a source column may be partitioned into two resultant columns, such that elements in the odd ordinal position of the source column are stored in a first resultant column and the elements in the even ordinal position are stored in the second resultant column. Neither the first or second resultant column is row aligned with the source column. For example, the fourth element in the source column and the second element in the second column belong to the same row, however, the index or position of the row is different between the source column and second resultant column.
Because there is no row alignment between any of the first and second resultant columns and the source column, row alignment by itself may not be used to perform row resolution. Described herein are approaches that enable row resolution when row alignment is lost between a source column and resultant columns after performance of a data manipulation operation.