A database is a collection of stored data that is logically related and that is accessible by one or more users. A popular type of database is the relational database management system (RDBMS), which includes relational tables made up of rows and columns (also referred to as tuples and attributes). Each row represents an occurrence of an entity defined by a table, with an entity being a person, place, thing, or other object about which the table contains information.
To extract data from, or to update, a relational table in a database management system, queries according to a standard database query language (e.g., Structured Query Language or SQL) are used. Examples of SQL queries include INSERT, SELECT, UPDATE, and DELETE.
To improve performance of database management systems, indexes can be defined. An index is a structure that provides relatively rapid access to the rows of a table based on the values of one or more columns. An index stores data values and pointers to the rows where those data values occur. An index can be arranged in ascending or descending order, so that the database management system can quickly search the index to find a particular value. The database management system can then follow the corresponding pointer to locate the row containing the value.
The advantage of having an index is that it speeds up the execution of database queries with search conditions that refer to an indexed column or columns. Generally, it is desired to create an index for columns that are used frequently in search conditions (such as in the WHERE clause of a SELECT statement).
Proper selection of indexes is important for optimal database performance. Typically, index selection is performed based on a workload that contains logged database queries. For a large database management system, the workload on which index selection is performed can be quite large. In many cases, a workload is defined by logging SQL queries that execute on a database management system during a given period of time. If the workload is large, then examining the workload can be computationally intensive, particularly since examining the workload involves detailed analysis of SQL queries.