Databases are used to store information for an innumerable number of applications, including various commercial, industrial, technical, scientific and educational applications. As the reliance on information increases, both the volume of information stored in most databases, as well as the number of users wishing to access that information, likewise increases. Moreover, as the volume of information in a database, and the number of users wishing to access the database, increases, the amount of computing resources required to manage such a database increases as well.
Database management systems (DBMS's), which are the computer programs that are used to access the information stored in databases, therefore often require tremendous resources to handle the heavy workloads placed on such systems. As such, significant resources have been devoted to increasing the performance of database management systems with respect to processing searches, or queries, to databases.
Improvements to both computer hardware and software have improved the capacities of conventional database management systems. For example, in the hardware realm, increases in microprocessor performance, coupled with improved memory management systems, have improved the number of queries that a particular microprocessor can perform in a given unit of time. Furthermore, the use of multiple microprocessors and/or multiple networked computers has further increased the capacities of many database management systems.
From a software standpoint, the use of relational databases, which organize information into formally-defined tables consisting of rows and columns, and which are typically accessed using a standardized language such as Structured Query Language (SQL), has substantially improved processing efficiency, as well as substantially simplified the creation, organization, and extension of information within a database. Furthermore, significant development efforts have been directed toward query “optimization”, whereby the execution of particular searches, or queries, is optimized in an automated manner to minimize the amount of resources required to execute each query.
Through the incorporation of various hardware and software improvements, many high performance database management systems are able to handle hundreds or even thousands of queries each second, even on databases containing millions or billions of records. However, further increases in information volume and workload are inevitable, so continued advancements in database management systems are still required.
One area that improves query performance is the use of an index over a field or fields of a table. A table is a logical structure maintained by the database management system. Tables are made up of columns and rows. There is no inherent order of the rows within a table. At the intersection of every column and row is a specific data item called a value. A column, or field, is a set of values of the same data type. A row is a sequence of values such that the nth value is a value of the nth column of the table. An index is a set of pointers to a table that has an entry for each record of the table. This entry is dependent on the value of that record in one or more columns of the table.
One use of an index is to create an order for the records of the table. For example if one of the columns of the table is a person's last name then building an index over that column would create an index that is ordered according to the alphabetical order of the last names in the table. Another type of index is used to indicate the value of a field of a particular record. For example, in a database that pertained to automobile inventory, there may be a field that indicates the color of the automobile and the allowed values for this field are only red, green, blue, black and white. An index could be created over this column that assigns a unique value to each of these five possible colors (i.e., a binary number consisting of 3 bits). With such an index created, a query that includes selection criteria that involves the color of the automobile can benefit from this index, such as a query which finds all the green cars. Without the index, each record of the table would need to be retrieved and the “color” field would be scanned to see if it matches “green”. This process is time consuming, especially with a large table. With the index created, however, the database engine can scan the index to find the rows containing the “green” value and only retrieve the corresponding records from the table. Scanning the index is much faster than touching every record in a table and the amount of records retrieved is significantly reduced as well.
An index as just described works well with a column that has a small number of possible values. However, there are other data types such as a “character-field” over which indexing has not appeared to be useful. These character-field, or variable character, data types allow, in many instances, up to 32K characters of a character string to be stored. One common activity which database users frequently perform on such character strings is to search the text in the string for a particular matching sub-string using the LIKE predicate. An example SQL statement might resemble:
SELECT LastName FROM Customers
WHERE Lastname LIKE ‘Mar %’
This query would return all the last names of all the records in the table that start with the letters “Mar”. Performing searching on a large field of characters using the LIKE predicate is very time consuming as it requires opening every record in the table and then performing a search of a large value of that record (i.e., the character-string value). Thus, there remains an unmet need for using an index to help with searching of a character-string value, such as by the LIKE predicate.