Traditional relational database systems, such as IBM's DB2, use a row-oriented storage system, where values of different attributes from the same data set are stored consecutively (i.e. row-by-row). When writing data, this row store architecture achieves high performance, since a single disk write suffices to push all of the fields of a single record out to disk; a database management system with a row store can therefore be considered as a write-optimized system.
As long as the database is not accessed, it may reside on storage medium such as a disk. However, as an application is run on the database, large numbers of rows have to be loaded into storage. For data sets containing a multitude of attributes, this requires a large amount of I/O, making queries to the database as well as statements/operations for data modification cumbersome and inefficient. As a consequence, database systems oriented toward ad-hoc querying of large amounts of data should be optimized with respect to reading operations. This applies particularly to applications such as data warehousing and business intelligence which rely on the efficiency the database system can provide when running complex queries on large data repositories. In an effort to create a read-optimized relational database management system, column-based data storage architectures have been suggested (see, for example, “C-Store: A Column-oriented DBMS” bus Mike Stonebraker et al., Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005). A column-store stores each attribute in a database table separately, such that successive values of that attribute are stored consecutively. As a consequence of this data storage concept, column-stores offer improved bandwidth utilization, since only those attributes that are accessed by a query need to be read off disk. However, column-stores are afflicted with several disadvantages: In particular, column-stores perform poorly for insert queries since multiple distinct locations on disk have to be updated for each inserted tuple (one for each attribute). In addition, in order for column-stores to offer a standards-compliant relational database interface, they must at some point in a query plan stitch values from multiple columns together into a row-store style tuple to be output from the database. Thus, column-stores, while making queries and predicate evaluation to the database more efficient, often require considerably more CPU time than row-stores.
In view of these disadvantages of both row-oriented and column-oriented database architectures, it would be desirable to have a data storage concept which enables efficient access to the database while keeping CPU expenditure low.