Web server applications are increasingly being used to provide users with access to data stored in databases over the Internet using web browsers. These web server applications respond to incoming user requests by providing concurrent threads of execution, each of which responds to an individual request, while maintaining per-user web server application access information. These requests often require different types of searches, calculations or modifications of data stored in large databases.
As a result of this environment, database oriented web server applications are required to logically maintain numbers of large result sets, and to be able to perform multiple types of calculations or insertions with high efficiency to maintain a reasonable performance level for the users.
Database designs have addressed the demand for increasing the performance of database operations, specifically searches and queries, by introducing indexes (also called inverted indexes). Each index is defined and exists within the context of a table in the database. Most indexes are optional, and are created by the user to enhance the speed of one or more queries performed on the table. The user can define more than one index for the same table, basing the indexes on different fields defined in the table. When the user defines an index based on a field in the table, the user is requesting the database to create a separate sorted list of all values of that field in that table, with a link from each value to the location of the corresponding record in the table. Thus, the database concept of an index is similar to the concept of indexes used, for example, in books.
As an example, consider a table of records arranged in rows and consisting of the names of all people in a company, with the first and last names saved in different fields (columns). As new names are added to the table, they can be added to the end of the stack on the disk in no specific order in relation to the value of the first or last name in the record. If the user knows that there will be frequent queries on the basis of the last names, the user can define an index for the last names in the table. As a result, the database creates a separate sorted list of all last names in the database, and includes within each record in the list a pointer to the location of the corresponding record in the table. In this way, whenever responding to a query for a specific last name (e.g. “Smith”), instead of walking through each record and performing a comparison of the value of the last name in that record with the desired last name (a method called full table scan), the database engine can search through the sorted index of last names and locate the records with the desired last name with fewer steps and then use their pointers to find the corresponding record(s) in the table. This is similar to the way one can locate all occurrences of a word in a book in much less time by using the book index instead of browsing through the whole book.
The index defined over the last names field, is an example of a simple index, defined over a single field of a table. A user may define multiple simple indexes for the same table to improve queries on those fields. On the other hand, one can also define a composite (multi-field) index, which is defined based on a combination of two or more fields in a table. For example, for the above table, assume that the database is frequently queried for records with specific conditions on last names and first names, e.g. all records with a specific last name (“Smith”) where the first name starts with a specific letter (say “P”). With this information the user can define a multi-field index for this table, based on the values of the first name appended at the end of the value of the last name for each record. This index makes such a query easier.
Indexes are usually sorted in specific balanced tree structures of linked records to facilitate the search mechanism. While creating a new index for a table adds the benefit of efficiency and higher performance for specific queries that the index is designed for, it can also introduce costs. One cost is due to extra space needed to store the index tree. This storage space has to be allocated in addition to the space needed to store the records of the table and corresponding metadata. Another cost is incurred in performance. When a record (row) is inserted into or deleted from a table, a corresponding record must be inserted into or deleted from each index list defined for that table. Alternatively, when the value of an indexed field is altered for a record in the table, a corresponding record must be altered in the index, which means the database engine must delete the index record from one location in the sorted index list and insert the new record into another location. Since each index list is a sorted structure, often in the form of a tree, these additions or deletions may require a rebalancing of the index structure. Thus, while creating a new index improves the performance of specific queries for which it is designed, it might degrade the performance of other database operations, specifically the performance of insertion, modification and deletion operations. This extra cost may be significant for databases where there are many updates in the records.
In database queries, one problem is how to efficiently determine the position of an index entry within its index. This problem has been addressed with a concept referred to here as “positional awareness.” Positional awareness is an index's capability wherein every index entry knows its relative position within the index. This capability is achieved by introducing counters as one of the contents of nodes in the index tree. During a look up operation, positional awareness allows the query engine to quickly determine the position of any index entry within the index.