Database servers that execute on multi-core processors perform data manipulation operations on large amounts of tabular data. Tabular data is data that is logically organized as rows and one or more columns, each column having a certain size, each row including each column. Logically, tabular data resides in a table-like structure, such as a spreadsheet or relational table. However, the actual physical storage of the tabular data may take a variety of forms. For example, in row-major format, tabular data may be stored as rows that are stored contiguously within a memory address space, each row including each column and a given column occupying the same number of bytes within a row. In column-major format, each column may be separately stored from other columns as a column vector stored contiguously within a memory address, the particular entry for each row in a column vector being stored in the same relative position or index of the respective column vector of each other column.
To perform data manipulation operations on tabular data efficiently, tabular data is moved from main memory to a memory closer to a core processor, where the operations can be performed more efficiently by the core processor. Thus, the movement of tabular data between the memory closer to a core processor and main memory is the type of operation that is performed frequently by database servers.
However, approaches for moving tabular data to a memory closer to the core processor add overhead that significantly offsets or eliminate any advantage gained by the movement of tabular data to the memory closer to the core processor. Even direct memory access (DMA) engines capable of offloading the task of moving data cannot offer sufficient increase in processor efficiency for several reasons. Tabular data processed by database operations is not organized or formatted in a way that is optimal for a DMA engine to move.
Additionally, the memory closer to the core processor is typically small in size. Therefore, a DMA engine will be able to move only a small portion of data into the local memory before that memory is full and needs to be emptied before it needs to be written to again. This results in the DMA engine repeating the process multiple times and issuing an interrupt each time the DMA moves data to the memory closer to the core processor, resulting in large number of interrupts. A large number of interrupts deteriorate core processor performance because every time the core processor is interrupted, the core processor must determine the source of the interrupt and how to handle the interrupt.
Furthermore, in multi-core processor architectures, where each core has its own local memory, a DMA engine is required per core in order to access the local memory of the core. Implementing a DMA engine per core dramatically increases the cost, in terms of gate count, area and power needed, of such multi-core processor architectures