There are many different types of database systems that store the data in different ways. Generally, the database systems have a logical table view such as shown in FIG. 1 in which one or more records is each a row of the table. The one or more attributes of the database (and hence the one or more attributes of each record) each is a column of the table. Each record (row) has a set of attributes associated with the record (such as Val1.1, Val2.1, Val3.1, Val4.1, Val5.1 and Val6.1 for the first record.
There are multiple ways to store a given logical table structure in a physical storage. The most popular way is “N-ary Storage Model” (NSM, also called “row-stores”) presented in FIG. 2. In this model, all attributes of a given record are stored together on disk. Multiple such records form a disk “page”.
As an example of the different storage, some database systems use column stores (aka Decomposed Storage Model (DSM) as shown in FIG. 3) that have different input/output (“IO”) and memory characteristics than traditional row-stores. On one hand, storing data in columns results in reduced IO volume (only relevant columns are fetched) and columnar storage results in increased potential of data compression, as values from the same domain are stored together. At the same time, fetching multiple attributes from disk can result in an increased number of IO requests. In addition, when performing multiple requests to read the attributes from the same table, multiple blocks need to be cached in memory at the same time, significantly increasing pressure on buffer pool. This is especially true since column databases typically use large block (disk-page) sizes. In extreme cases, this can lead to inability of the system to process a query within a given memory constraint. Finally, if there are relatively few records in a given table, allocating a disk block for each column separately can lead to significant waste of disk space and buffer memory.
Some database system use PAX storage as shown in FIG. 4. The PAX storage was proposed in an article by Ailamaki et al. entitled “Weaving Relations for Cache Performance”, Proceedings of the 27th Very Large Data Bases (“VLDB”) Conference, Roma, Italy (2002), pp, 1-12, which is incorporated herein by reference. The PAX storage allows the storage of multiple attributes in a single disk block, like in NSM system in FIG. 2, but in each block data is stored in columns, like in DSM.
Vectorwise uses a generalized PAX concept allowing each table to consist of multiple PaxGroups, where each PaxGroup is a subset of table attributes stored together using the PAX storage. Different approaches for dividing a table into PaxGroup are possible, for example: many groups each storing just one attribute (Vectorwise default, explicit when used with the WITH STRUCTURE=VECTORWISE clause on CREATE TABLE statement, property-wise equivalent to DSM) or one group with all attributes (when used with WITH STRUCTURE=VECTORWISE_ROW, IO-wise equivalent to NSM). This can lead to problems described above.
In summary, the main problems are: 1) in DSM, with very many scanned columns, very many IOs need to be performed; 2) in DSM, with very many scanned columns, many blocks need to be cached in memory at the same time to allow tuple reconstruction; 3) in DSM, tables with few records can waste significant disk and buffer space due to allocating a block per each attribute; 4) in NSM/PAX, with very many or very wide columns, only a few tuples will fit in a single disk block that reduces the opportunity for data compression; and 5) in NSM/PAX, with very many or very wide columns it might be that a single tuple won't even fit in a block. Note that solving all these problems at the same time is impossible assuming non-duplicated data storage, as optimizing for problems 1-3 leads to a full NSM storage, while optimizing for problems 4-5 leads to a full DSM storage.
Automatic DSM storage (aka AutoDSM or AutoPAX) attempts to balance the above problems by avoiding situations where problems might occur, and at the same time trying to assure that benefits of column stores are maintained.
There are also some databases that provide ways to partition table attributes into sub-groups. One example is a manual vertical partitioning, such as was proposed by Vertica with its Flexstore technology that is described at http://www.vertica.com/2010/08/18/vertica-flash-runaway-performance-at-a-low-price/ which is incorporated herein by reference. This can be used to manually partition tables into something that AutoDSM would produce. But since it's an arbitrary mechanism, it allows anything one imagines. An extra benefit here is that it allows putting some attributes on faster devices (e.g., solid-state drives) and some on slower (e.g. standard HDDs).
Some other solutions propose automatic vertical fragmentation based on looking at the query load, such as those disclosed in the Hankins et al. article entitled “Data Morphing: An Adaptive, Cache-Conscious Storage Technique, Proceedings of the 29th VLDB Conference, Berlin, Germany (2003), pp. 1-12, which is incorporated herein by reference, and an article by Grund et al., “HYRISE—A Main Memory Hybrid Storage Engine”, VLDB '10 in Singapore (2010), pp. 1-12, which is incorporated herein by reference. The goals of such algorithms are to put columns accessed together close to each other. While the above techniques are useful, they are mostly targeted at optimizing performance. AutoDSM addresses a different goal: reducing resources, improving compression rates and actually allowing very wide tables to be used. It also is not clear how well the automatic vertical fragmentation algorithms would handle 100s or 1000s of columns.
The automatic vertical partitioning of the data morphing article and the HYRISE article can provide much better results than other systems, but require a'priori knowledge of queries and it is unclear if these systems are meant to work for schemas with 1000+ columns. There may also be manual partitioning that is possible but it is fraught with human error and very slow.
Thus, it is desirable to provide a system and method for improving input/output and memory utilization in a database that overcomes the limitations of the above techniques and systems and it is to this end that the disclosure is directed.