1. Technical Field
This invention relates to organization of data in a database. More specifically, the invention introduces a hybrid layout for data in the database that supports an efficient block oriented query processing.
2. Description of Related Art
A database is a collection of information organized in such a way that a computer program can quickly and efficiently select desired pieces of data. It is known in the art that data are distinct pieces of formatted information. In electronic form, data are bits and bytes stored in electronic memory. Traditional databases are organized by fields, records, and files. A field is a piece of information; a record is one complete set of fields; and a file is a collection of records. To access information from a database, a program in the form of a database management system is employed.
Computer hardware does not directly support the concept of multidimensional arrays. Computer memory is one-dimensional, providing memory addresses that start at zero and increase serially to the highest available location. Multidimensional arrays are therefore a software concept that maps the elements of a multi-dimensional array into a contiguous linear span of memory addresses. There are two ways that such an array can be represented in one-dimensional linear memory. These two options, which are explained below, are commonly called row-major and column-major. All programming languages that support multidimensional arrays choose from one of these two possibilities.
Column-major is a manner of contiguously storing all elements of the first dimension of an array in memory. As you move linearly through the memory in such an array, the first dimension changes the fastest. However, there are issues that arise with query processing associated with the column major structure. Columns are often not aligned with register boundaries of modern processors. A hardware register is a high speed storage area within a central processing unit. All data must be represented in a register before it can be processed. In one embodiment, the register can contain the address of a memory location where data is stored rather than the actual data itself. Most modern processors have one or more 64, bit registers, and some have 128, bit registers. However, column-major data represented in the registers commonly do not occupy the size of the register.
Conversely, row-major is a manner of contiguously storing all elements of the second dimension of an array in memory. In the row-major structure, as you move linearly through the memory, the second dimension changes the fastest. However, there are issues that arise with query processing of the row-major structure as well. In the current state of the art, a query executor must touch every field of each row, even if the query only touches some fields. Accordingly, processing a query in a row-major structure requires processing of extraneous data.
Recent practice developments for data warehouses store tables in compressed form in main memory. Accessing the table is costly due to predicate evaluation, wherein a predicate is a logical operator that returns either true or false as answers. Examples of predicates include, but are not limited to, AND, OR, NOR, XOR, etc. Organizing the tables stored in main memory in either the row-major or the column-major format is not optimal. A row-major format requires queries to scan columns that are not necessary for the query. Similarly, efficient query processing of the column-major format requires the columns to be padded to word boundaries as the columns are frequently not aligned with register boundaries.
Accordingly, there is a need for organizing the structure of the table format that efficiently responds to query processing.