Table data is conventionally viewed both pictorially and conceptually as being two dimensional. For example, the sample table below contains data related to a plurality of employees.
Employee IDNameSalaryDepartment1234John75000Finance1235Tom65000Finance1236Kate85000LegalWhen stored in a computer memory, however, this data will be stored as a one-dimensional string of values. A first technique for storing the values of the table is referred to as a row-major orientation, which stores all the values for a first row and then stores the values for the next row—e.g. “ . . . 1234; John; 75000; Finance; 1235; Tom; 65000; Finance; 1236; Kate; 85000; Legal . . . ” A second technique for storing the values of the table is referred to as column-major orientation, which stores all the values for a first column and then stores the values for a second column—“ . . . 1234; 1235; 1236; John; Tom; Kate; 75000; 65000; 85000; Finance; Finance; Legal . . . ” In other words, data from a table may be stored in a storage format that is either the column type (i.e., column major) or the row type (i.e., row major).
When retrieving stored data, data is moved from a permanent memory such as a hard disk drive to a short term memory such as a cache in units of blocks, where a block may be a fixed size such as 32K. When retrieving data that is stored sequentially in the permanent memory, the number of blocks that needs to be transferred to short term memory is minimized. For example, if the table above is stored in a row major format, then retrieving the record for employee 1234 comprises retrieving data that is stored sequentially on the permanent memory (i.e., “1234; John; 75000; Finance”). Accordingly, that data is likely to all be contained with one block or to only span a few blocks. If, however, a function to calculate an average salary were executed on the data stored in a row-major format, the function would have to retrieve all the salaries stored in the table, which are not stored sequentially, thus necessitating the transfer of many more blocks of a data to short term memory than retrieving the employee record required.
If the table above is stored in a column-major format, then the scenario is reversed. The salary data is stored sequentially (“75000; 65000; 85000”), and determining an average salary will only require transferring a few blocks of data to the short term memory. The complete record for an employee, however, will not be stored sequentially, thus requiring the transfer of many blocks of data.
Transferring numerous blocks of data is more time consuming and more resource intensive than transferring only a few blocks of data. Accordingly, the average salary operation described above would be completed more quickly and with fewer system resources on the data stored in a column-major format, but the retrieval of an employee record would be executed more quickly and with fewer system resources on the data stored in row-major format. A database designer can choose the preferable storage technique based on the types of operations that will be performed most frequently, but neither format will be able to efficiently execute all types of operations. Accordingly, there exists in the art a need for a data storage technique that combines the benefits of both column major and row major storage techniques.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.