1. Field
The field of the present invention relates to systems and methods for storing and accessing data, and more particularly to data storage, database queries and data retrieval.
2. Background
As the quantity and types of data collected by businesses has increased, the size and complexity of databases used to manage and analyze that data has expanded dramatically. Substantial efforts have been made to improve the access methods and performance of these databases. One technique for improving performance of large databases is to partition tables or other data sets into smaller data sets, sometimes referred to as partitions. Partitioning can be used to improve performance by reducing the amount of data that needs to be retrieved to respond to a query. For example, a query may request data from a data set where specified attributes are within certain ranges. If the data set is partitioned into smaller data sets based on ranges of values for that attribute, only a subset of the partitions may need to be retrieved to respond to the query. While partitioning may be used to improve performance in many database systems, the flexibility and extent to which data partitioning and other optimization may be performed may be limited by the structure imposed on the data when it is received or stored. Many database and data storage systems have predetermined schema that may not capture information regarding the structure of data as it is originally provided. As a result, the extent to which partitioning and other optimization is performed may be limited in many systems.
Some systems capture additional information as data is received that can be used for optimization. For example, U.S. Pat. Nos. 8,032,509, 7,877,370, 7,613,734, 7,769,754, 7,720,806, 7,797,319 and 7,865,503 describe systems and methods in which algebraic relations may be composed from statements received by the system and stored in an algebraic cache for use in responding to subsequent queries. In responding to a query, an optimizer can retrieve and generate alternative collections of algebraic relations equal to the requested data set. The collections of algebraic relations can then be evaluated and the lowest cost collection of algebraic relations can be used to calculate and return the requested data set. The system may also perform comprehensive optimization by analyzing the algebraic cache to generate additional relations and data sets. For example, an optimizer may identify a significant number of restrictions against a specific set using a range of values by inspection of the algebraic cache. From these entries, the optimizer may determine ranges of the values to use for partitioning the data set into subsets. The optimizer may insert the appropriate relations into the algebraic cache for each of the partitioning subsets and also insert a relation indicating that the union of the subsets equals the set. This type of partitioning allows for less data to be examined in responding to queries, resulting in an improvement via the reduction of the calculation time and resources required.