Metadata of data containers (such as files and logical units stored at a storage system) includes multiple attributes (e.g., size, owner of the data container, location on storage device, modification time, etc.). Storage administrators can query metadata to obtain useful information about data containers that can help them make decisions regarding various storage management tasks. Typical queries may ask questions, such as “what data container type takes the most space?”, “what are data containers that are deleted since yesterday?”, or “how many data containers stored on expensive disks have never been accessed since last year?” Quick answers to these questions can help storage administrators to understand how storage resources are used. Since a large number of attributes is stored (e.g., from millions to billions) on storage devices, searching the attributes in a timely fashion can become a challenging task.
Current solutions to accelerate search for multi-attribute queries (e.g., queries based on more than one attribute) could be classified into two categories: using single attribute indexes and multi-attribute indexes. An index is a data structure having a plurality of keys, with each key identifying a data record. A single attribute index is an index created based on a single attribute. Thus, if a query includes more than one attribute on which the query needs to match (such as a data container size and modification time) and an index exists for each searchable attribute (one for the data container size attribute and the other for the modification time), each index is searched separately. Then, a logical operation, such as intersection, is applied to the results of both searches to find the attributes that match both predicates in the query. In another approach, if the index is created only for one searchable attribute, that index is searched first. Then, the second predicate can be used to filter the records that do not match the second predicate. Both approaches are inefficient, however, in metadata attribute searches in which the number of data containers that match multiple predicates (or conditions) is less than the number of data containers that match a single predicate. Thus, these approaches do not take advantage of the combined selectivity of multiple predicates in a query, which is usually lower than any single predicate selectivity. Lower selectivity means fewer index entries to scan and process, thereby reducing the need to search unwanted data records and to make unnecessary, time consuming disk requests.
Multi-attribute indexes, also known as multicolumn or composite indexes, are indexes built on multiple attributes. They can be viewed as sorted arrays containing values that are created by concatenating the values of indexed columns/attributes. Therefore, the concatenating order of the columns determines how “important” a column is. Typically, the first column is the most important one. One of the limitations of existing composite indexes is that a composite index first uses predicates on the first attribute to find a list of records that match the first attribute, then uses predicates on other attributes to obtain a final matching list. For queries that have predicates on the first column, especially when those predicates are selective (i.e., they better eliminate unwanted records), multi-attribute indexes are useful and can help reduce the amount of index entries to be scanned and examined. However, for other queries, in which the first predicate is not the most selective one, multiple index entries are searched first based on the first attribute in the composite index. Then, the resulting list is searched based on the subsequent portion of the composite index. This, in turn, results in searching unnecessary multiple index entries. Due to this limitation, multi-attribute indexes are typically used sparingly.
Accordingly, what is needed is a mechanism that will reduce inefficiencies of existing mechanisms for searching metadata of data containers.