State of the art database management systems (DBMS's), like the underlying data files out of which and on top of which they historically grew, continue to store and manipulate data in a manner that closely mirrors the users' view of the data. Users typically think of data as a sequence of records (or “tuples”), each logically composed of a fixed number of “fields” (or “attributes”) that contain specific content about the entity described by that record. This view is naturally represented by a logical table (or “relation”) structure (referred to herein as a “record-based table”), such as a rectilinear grid, in which the rows represent records and the columns represent fields.
The long-standing existence of record-based tables and their correspondence to a conventional user view, in the absence of generally recognized drawbacks, has led to their nearly universal acceptance as the major underlying internal representation of databases. Yet record-based tables contain key structural weaknesses including high levels of unorderedness and redundancy that have traditionally been regarded as unavoidable. For example, such tables can be sorted or grouped (i.e., the contiguous positioning of identical values) on at most one criterion (based upon column values or some function of either column values or multiple column values). This limitation renders essential database functions, such as querying and updating, on all criteria other than this privileged one awkward and overly resource-intensive.
The above deficiencies inhere in the fundamental properties of the record-based table structure, in particular, the requirement that the positioning of each field be made co-linear with all other fields in the same record. This arbitrary positioning of fields in record-based table structures excludes all other arrangements. It thus obscures natural and exploitable latent data relationships that are revealed by more ordered, condensed and efficient data arrangements. Moreover, the inability of record-based tables to effectively group or sort data leads to negative characteristics of state of the art DBMS's such as unorderedness, redundancy, cumbersomeness, algorithmic inefficiencies and performance instabilities.
Database research provides palliatives for these problems, but fails to uncover and address their underlying cause (i.e., the reliance on record-based table structures). For example, the inability to represent a natural, multi-dimensional grouping within the confines of a record-based table structure has led to the creation of index-based data structures. These supplementary structures are inherently and often massively redundant, but they establish groupings and orderings that cannot be directly represented using a conventional table. Index-based structures typically grow to be overly lengthy, convoluted and are cumbersome to maintain, optimize and especially update. Examples of common indexes are b-trees, t-trees, star-indexes, and various bit maps.
Other supplementary structures developed in the prior art have different drawbacks. For example, hash tables can provide rapid querying of individual data items, but their lack of sort ordering render them unsuitable for range queries or for any other operation that requires returning data in a specific order.
The ability to maintain an ordered, non-redundant, multi-dimensional data set, using flexible sorting and/or grouping criteria, is extremely useful to database management. Sorted data makes rapid searching and updating possible via, for example, binary search algorithms and insertion sorts. Grouped data enables condensation that reduces space requirements and further increases the speed of, for example, searching and updating.
A system of data storage in which most or all columns of a data table can be stored in grouped and/or sorted order is thus extremely desirable. Previous studies have investigated “fully inverted databases,” which index each column through traditional methods, preserving all the inadequacies of records and indexes. Additionally, the bloated storage requirements necessary to accommodate complete indexing tend to make fully inverted databases impractical, especially, but not only, in main memory databases.