Present invention embodiments relate to database query evaluation, and more specifically, to limiting scans of loosely ordered data and/or grouped relations based on query predicate evaluation.
Searching for information using a query may result in a search of a large database table when an evaluation of the query indicates that the large database table should be scanned. In such a situation, it may be beneficial to eliminate rows (e.g., individual data records) in the large database table from consideration early in the scanning sequence before an unnecessarily large processing overhead has been incurred. Some database management systems maintain metadata about each storage region in the form of range values or range maps that define minimum and maximum ranges in a given storage region in order to filter storage regions before actually reading and searching the stored data. For example, if a storage region is known to contain records with column values between 100 and 200 (e.g., as stored in the range map metadata), then when a query with range values outside of that known range (e.g., a query with a value of 500) is evaluated, the evaluation can eliminate that storage region.
In a hybrid column store, column data for data records (e.g., rows) in a storage region are not necessarily stored as a group of rows, but grouped into blocks of column data for permanent storage (i.e., as stored on a drive or disk). Example, hybrid column store techniques include, e.g., a Partition Attributes Across (PAX) data store, fractured mirrors, fine-grained hybrids, and variations thereof. Each hybrid column store technique has corresponding advantages and disadvantages. The above-mentioned hybrid column store approaches each have disadvantages with respect to input/output (I/O) volume, CPU and/or storage allocation. In a hybrid column store further benefits may be obtained by avoiding reading columns which should not be processed, beyond the benefits of avoiding reading entire regions of rows which should not be processed.
Furthermore, certain forms of data tend to be loosely ordered, e.g., when mapping highway traffic, data may be loosely ordered based on a time of day. That is, rush hour traffic may correlate to a morning rush hour or an evening rush hour in which traffic loads are much higher than the average traffic load. Similarly, data may exhibit a grouped relationship. For example, a given geographic region such as a county, city, country or state may have attributes such as a latitude and a longitude that may correlate to a given temperature or amount of rainfall at any given time of day or date. A hybrid column store may provide a level of data granularity that can avoid the reading of column data when the data are loosely ordered or relationally grouped, even when the values of those columns are ostensibly required for query evaluation, and when the region containing those values contains rows which must be processed.