The present invention relates to storage of data in databases, and in particular, to the storage of database information in a hybrid table format.
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
A database is an electronic filing system that stores data in a structured way. The primary storage structure in a database is a table. A database may contain multiple tables and each table may hold information of a specific type. Database tables store and organize data in horizontal rows and vertical columns. Rows typically correspond to real-world entities or relationships that represent individual records in a table. Columns may denote specific attributes of those entities or relationships, such as “name,” “address” or “phone number.” For example, Company X may have a database containing a “customer” table listing the names, addresses and phone numbers of its customers. Each row may represent a single customer and the columns may represent each customer's name, address and phone number.
Databases are generally stored in computer memory that is one-dimensional. Two dimensional database tables must therefore be mapped onto a one-dimensional data structure to be stored within a database. As shown in FIG. 17, one mapping approach involves storing a table in a database row-by-row (i.e., a row-oriented storage model). This approach keeps information about a single entity together. For example, row-by-row storage may store all information about a first customer first, then all information about a second customer and so on. Alternatively, as also shown in FIG. 17, a table may be stored in a database column-by-column (i.e., a column-oriented storage model). This approach keeps like attributes of different entities together. For example, column-by-column storage may store all customer names first, then all customer addresses and so on.
Data must generally be accessed from a table in the same manner that it was stored. That is, conventional computer storage techniques require dedicated query operators that can access specific types of storage models. For example, row query operators are used to process data stored in a database in row-formatted storage models and column query operators are used to process data stored in column-formatted storage models.
Choosing which storage model to use thus often depends on how data will be used. Row-oriented storage models are commonly well-suited for transactional queries. The row-store format emphasizes the database row as the manipulable element, and is typically used for On-Line Transaction Processing (OLTP) involving a relatively large number of on-line transactions (rows), with each transaction characterized by relatively larger data types (columns).
By contrast, column-oriented storage models are generally well-suited for analytical queries. The column-store format emphasizes the database column as the manipulable element, and is typically used for On-Line Analytical Processing (OLAP) of a subset of a total number of transactions (rows) over a fewer number of data types (columns) that may include aggregations of basic data types. A database table in the column-store format is typically used for interrogation and analysis of the raw data for purposes of problem-solving and planning that form a part of Business Intelligence (BI) efforts.
In summary, row store may be useful for retrieving individual records having many columns with a primary key condition. Column store may be useful for performing more complex functions such as aggregation/join over a relatively small number of columns.
Accordingly, conventional query processing schemes are bound to the underlying storage model of the database being queried. In reality, however, a database having certain data stored in a column-formatted storage model may be asked to handle a transactional query relating to that data, or a database having certain data stored in a row-formatted storage model may be asked to handle an analytical query relating to that data. For example, a database having data stored in a row-formatted storage model may receive a mixed set of queries requiring transactional and analytical processing of that data.
Both the row-store and column-store database table formats offer various benefits. For example, the row-store format offers ready scalability for data, as it is expected that more and more transactions will require storage in additional rows. The row-store table format is, however, relatively memory intensive for analytic queries (e.g. aggregation, join) as it scans a table vertically, incurring cache misses as the data is stored horizontally.
Conversely, the column-store format offers flexibility in allowing complex manipulation of data involving table joins and aggregation, as well as relatively low memory consumption by allowing compression within data types across multiple entries by dictionary encoding. The column-store database format, however, typically does not allow ready manipulation of the same volumes of data as the row-store table.
Thus, a row-store table is more effective to serve row-wise record access such as single record selection with primary key lookup. A column-store table is better to serve column-wise record access such as single column aggregation. For row-wise record access, column-store table becomes memory-intensive, because the data format is organized vertically in column-store so that cache misses occur while accessing record values horizontally. For column-wise record access, row-store table becomes memory-intensive because the data format is organized horizontally in row-store, so that cache misses occur while reading specific column values.
Despite the various advantages of the various database table types, conventionally a table can generally only be in row- or column-store at any point in time of a business life cycle. Accordingly, the present disclosure addresses this and other issues with systems and methods for implementing a hybrid database table stored as both a row and a column store.