Query optimization is a function of many relational database management systems in which multiple query plans for satisfying a particular database query are examined to determine the most efficient way to execute the query. This may or not result in the absolute best strategy for the query because there are many ways to create plans, and some plans require more time and resources than are available to execute the query. There is a trade off between the amount of time spent figuring out the best plan and the amount of time executing the plan. Different qualities of database management systems have different methods of balancing the trade off. For example, cost based query optimizers may evaluate the resource footprint of various query plans and use this as the basis for plan selection. Generally the resources which are costed include CPU path length, amount of disk buffer space, disk storage service time, and interconnect usage between units of parallelism. The set of query plans examined may be formed by examining possible access paths (e.g., primary index access, secondary index access, full file scan) and various relational table join techniques (e.g., merge join, hash join, product join). The search space comprising the query plans may become quite large depending on the complexity of the database query.
To assist with complex and nested queries, query optimizers may create temporary objects or index objects as part of the optimization process. These temporary objects may be optimized subsets of the overall optimization including partially optimized tables, or specially created indexes. For example, the query optimizer may analyze a row selection in a query and use that analysis to determine if creation of a separate object containing an index may be beneficial. These objects are generally temporary in nature, meaning that they may persist until the system is restarted, until the underlying data has changed, or until the object is no longer needed. Index objects are more often of a permanent nature (across system restarts). Both temporary objects and temporary index objects are generally stored in query plan caches which allow for the use and reuse of the temporary objects in an effort to save time and processing for subsequent queries.
One such index object is an encoded vector index (“EVI”). An EVI is a data structure that is made up of two primary components: a symbol table and a vector. The symbol table contains the distinct key values in the rows covered, as well as statistical information about each key. The statistical information typically includes a numeric ‘gray’ code identifying the key, the first and last rows where the key is found, and the number of times the key appears in the table. The vector corresponds to the actual rows in the table and contains a list of byte codes indicating which key each row contains. EVIs are a complementary alternative to existing index objects, such as binary radix tree structure-logical file or SQL index, and are a variation on bitmap indexing. EVIs are usually recommended for larger tables with a low number of distinct values for the index key. Because of their compact size and relative simplicity, EVIs provide for faster scans of a table that may also be processed in parallel. The database engine uses the vector portion of the EVI to build a dynamic bitmap that contains one bit for each row in the table. If the row satisfies a query selection, the bit is set on. If the row does not satisfy the query selection, the bit is set off. Similar to a bitmap index, intermediate dynamic bitmaps can be AND'ed and OR'ed together to satisfy an ad hoc query.
Even independent from the vector, however, an EVI symbol table may be extremely useful for understanding cardinality, data clustering, data skew, and frequency of values. Use of EVIs for commonly used local selection and foreign key columns of larger tables, where the column cardinality (number of distinct values) is relatively small compared to the table's overall record count may be very beneficial and may be requested by some users. However, this requires user intervention to create EVIs and also comes with the additional cost associated with creation of the vector. The creation and maintenance of this vector structure may be rather difficult and time-consuming compared to the relatively simply maintenance of the symbol table portion of the EVI, especially in rapidly changing environments. However, in many cases, the EVI symbol table alone can provide a great deal of information that can be used for statistics as well as execution of certain queries.
What is needed therefore is an index that can be used in a database query containing the statistical and cardinality content of the EVI, but that does not require the time and resources to generate and maintain the vector.