The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely sophisticated devices, and computer systems may be found in many different settings. Computer systems typically include a combination of hardware, such as semiconductors and circuit boards, and software, also known as computer programs.
Fundamentally, computer systems are used for the storage, manipulation, and analysis of data, which may be anything from complicated financial information to simple baking recipes. It is no surprise, then, that the overall value or worth of a computer system depends largely upon how well the computer system stores, manipulates, and analyzes data. One mechanism for managing data is called a database management system (DBMS), which may also be called a database system or simply a database.
Many different types of databases are known, but the most common is usually called a relational database (RDB), which organizes data in tables that have rows, which represent individual entries or records in the database, and columns, which define what is stored in each entry or record. Each table has a unique name within the database and each column has a unique name within the particular table. The database also has an index, which is a data structure that informs the database management system of the location of a certain row in a table given an indexed column value, analogous to a book index informing the reader on which page a given word appears.
To be useful, the data stored in databases must be capable of being retrieved in an efficient manner. The most common way to retrieve data from a database is through statements called database queries, which may originate from user interfaces, application programs, or remote systems, such as clients or peers. A query is an expression evaluated by the database management system. As one might imagine, queries range from being very simple to very complex. Although the query requires the return of a particular data set in response, the method of query execution is typically not specified by the query. Thus, after the database management system receives a query, the database management system interprets the query and determines what internal steps are necessary to satisfy the query. These internal steps may include an identification of the table or tables specified in the query, the row or rows selected in the query, and other information such as whether to use an existing index, whether to build a temporary index, whether to use a temporary file to execute a sort, and/or the order in which the tables are to be joined together to satisfy the query.
When taken together, these internal steps are referred to as an execution plan. The execution plan is typically created by a software component of the database management system that is often called a query optimizer. The query optimizer may be part of the database management system or separate from, but in communication with, the database management system. When a query optimizer creates an execution plan for a given query, the execution plan is often saved by the database management system in the program object, e.g., the application program, that requested the query. The execution plan may also be saved in an SQL (Structured Query Language) package or an execution plan cache. Then, when the user or program object repeats the query, which is a common occurrence, the database management system can find and reutilize the associated saved execution plan instead of undergoing the expensive and time-consuming process of recreating the execution plan. Thus, reusing execution plans increases the performance of queries when performed by the database management system.
Many different execution plans may be created for any one query, each of which returns the required data set, yet the different execution plans may provide widely different performance. Thus, especially for large databases, the execution plan selected by the database management system needs to provide the required data at a reasonable cost in terms of time and hardware resources. Hence, the query optimizer often creates multiple prospective execution plans and then chooses the best, or least expensive one, to execute.
One factor that contributes to the cost of executing a particular execution plan is the way in which the database table or tables to which the query is directed are partitioned. Partitioning allows for table data to be stored using more than one physical data space, but the table appears as one object for data manipulation operations, such as queries, inserts, updates, and deletes. Partitioning can significantly improve performance if it is done properly, but partitioning also has the potential to decrease performance if done improperly. Partitioning has two fundamental types: horizontal and vertical. Horizontal partitioning allows tables to be partitioned into disjoint sets of rows, which are physically stored and accessed separately in different data spaces. In contrast, vertical partitioning allows a table to be partitioned into disjoint sets of columns, which are physically stored and accessed separately in different data spaces.
Because partitioning is so important to query performance, in either a positive or a negative way, users would like to have information that would aid them in making decisions about how to partition their tables, so that they make take appropriate action to tune performance. But, no current tools provide user with partition performance information, so users are frustrated in their efforts to performance tune their queries that use partitioned database tables.