This disclosure relates generally to database management systems in a data processing system and more specifically to granularity of information represented in metadata stored in a managed object of the database management system of the data processing system.
A typical problem in database environments is a request to find elements of a long list of values that match a given value, or that belong to a given set of values. A brute force approach typically involves scanning the whole list, but that is often inefficient. In some situations, an index is created and/or a sort of the values is performed and then a query is performed more efficiently, but often that is infeasible due to the overhead of generating and maintaining the index or sorted elements, or due to constraints on how the data is stored.
One existing approach involves partitioning the data into zones and maintaining, for each zone, a modest amount of metadata, which can be used to eliminate many of the zones from consideration, reducing the number of zones that need to be scanned. With the increasing use of synopsis tables, also often referred to as zone maps, to provide metadata describing underlying regions of a table, there is increasing demand on other abilities and use in increasingly wider areas. Zone maps however typically offer limited information on content in the zone or stride. For example, the most commonly tracked metadata is associated with a high value and a low value for the zone, to bracket a range of values present in a particular region of a table. The high value and low value per column in the zone may form a very useful coarse grain filtering when the high/low values are of a limited range, and is typically better than no information at all. These high values and low values are used to determine whether a particular region of the table needs to be accessed, and thus are used conditionally to reduce input/output operations and processing requirements for the processing of a query. Often the level of detail in the zone map is not sufficient to eliminate ranges of a table that do not contain the target column value(s), which causes extra input/output operations, and uses more processor resources to decompress the data.
However a page level rarely has the metadata needed to avoid decompression and/or decryption before predicate application and resulting searching of a list of values is expensive in terms of input/output operations, and processor resources. Further, with use of encryption, compression at a column level, row level and page level, examination of columns on a page to determine whether a particular row qualifies as a predicate is typically very expensive in terms of computing resource.
Other solutions using indexes typically require large amounts of storage and processing resources to maintain. Column stores partly solve this resource usage problem by creating separate copies of all columns by breaking tables vertically enabling predicates to be applied to a single column in the store while not touching other columns not required to respond to the query. Other solutions typically involve applying the predicates to compressed data, or potentially after partial decompression of the data. Conventional use of Bloom filters is evident in previous solutions as well.