Fundamentally speaking, today's computer systems are primarily used for storage, manipulation, and analysis of information. This information, called data, can be anything from complicated financial information to simple baking recipes. It is no surprise, then, that the overall value, or worth, of a computer system depends largely upon how well the computer system stores, manipulates, and analyzes data. This patent pertains to the mechanism used on a computer system to perform these functions. This mechanism is formally referred to herein as a Data Management System, although the terms “database system,” “database,” and Data Management System (DMS) are used interchangeably throughout this patent.
At the most basic level, the data of a database is stored as a series of logical tables. Each table is made up of rows and columns. Each table has a unique name within the database and each column has a unique name within the particular table. Different statements, called queries, allow the user to obtain data from the database. As one might imagine, queries range from being very simple to very complex. When a database receives a query, the database interprets the query and determines what internal steps are necessary to satisfy the query. These internal steps may include identification of the table or tables specified in the query, the row or rows selected in the query, and other information such as the order in which the tables were joined together to satisfy the query. When taken together, these internal steps are referred to as an execution, or access, plan. When an access plan is created for a given query it is often saved by the DMS. Then, when the user repeats the query, as users often do, the database can reutilize the saved access plan instead of undergoing the expensive process of recreating it from scratch.
As an advanced property, current databases are also capable of discarding saved access plans when it is sensible to do so. One example, might be the deletion of old, unused access plans. Another more complicated example involves the notion of “data skew.” As used here, data skew refers to significant, non-uniform data distribution. For instance, consider a customer table having customer names organized in alphabetic order. If the table includes a few names that begin with each letter of the alphabet, the table would be said to contain more or less uniform data. However, if for some reason the table also included one hundred names starting with the letter “S,” the table would be said to contain skewed data. Data skew is important to recognize because a different access plan will almost certainly be needed to best handle the query when data skew is involved. Thus, when data skew is encountered, certain systems often discard an access plan stored for a particular query and recreate the access plan so as to best handle the skewed case. The newly created access plan is then stored for reuse.
While this approach is efficient in the handling of situations involving data skew, it can end up being inefficient overall because an otherwise valid access plan can be discarded to accommodate an infrequent situation (i.e., data skew). Without a Data Management System that can provide intelligent access plan caching in an environment where data skew is present, database performance will continue to be constrained by less sophisticated access plan caching strategies.