The present invention relates to database management systems generally, and more particularly the present invention relates to a method, a data processing system and an article of manufacture for prioritizing maintenance of database tables.
Modern database schemas can be very large and complex, with a database management system (“DBMS”) managing perhaps tens, hundreds, or even thousands of tables. The tables may contain tens or hundreds of columns each and millions of records collectively. As known to those skilled in the art, it is necessary regularly to perform maintenance tasks on the database tables to achieve satisfactory database performance over an extended period of time.
Consider a specific example. Many modern databases use a standardized query language known as Structured Query Language (“SQL”). With SQL, a user is able to specify a database query using declarative language, leaving it up to the DBMS to figure out how to best access and join tables in the database in order to extract matching records.
For any given SQL query, there may be numerous ways in which tables may be joined and accessed to obtain the requested data. Many modern DBMS products include query optimizers to evaluate alternative query execution plans (“QEPs”) and to select one that is suitable. Evaluating many possible QEPs may involve estimating costs (i.e. resources consumed) for each possible QEP based on mathematical models. These mathematical models typically require input from various statistics collected for each database table.
Thus, for the purposes of obtaining cost estimates for alternative QEPs, it is desirable to perform table statistics collection regularly so that the table statistics are up-to-date. Otherwise, the cost estimates for QEPs may be inaccurate, and may lead to reduced database query performance.
In addition to maintenance tasks involving the collection of up-to-date table statistics, various other table maintenance tasks may also be required, such as backup, recovery, reorganizing of records, log file maintenance, and so on.
While it may be desirable to be able to perform these various table maintenance tasks at any time, and as often as required, this is often not possible. More typically, service level agreements or other database performance considerations may impose significant restrictions on when table maintenance may be performed. Thus, a limited window of opportunity for performing various maintenance tasks may be made available from time to time (e.g. overnight or during off-peak hours). For maintenance scheduling purposes, a continuous series of these maintenance windows must be considered, since a maintenance task that is not scheduled in a current maintenance window may need to be rescheduled for a future maintenance window. (In the context of the current discussion, the table maintenance tasks performed during a maintenance window is referred to as a maintenance “iteration”.)
For a database schema with a relatively modest number of tables, it may be possible for a human database administrator to analyze the maintenance requirements for each table, and manually schedule the maintenance tasks in upcoming maintenance iterations. However, as the number of tables increases, manual scheduling may quickly become impractical, if not impossible.
There are some known techniques and tools for autonomically scheduling maintenance for databases. However, these known techniques tend to exhibit certain limitations, such as a tendency to schedule maintenance unevenly over available maintenance windows, or to make changes in scheduling which may not be suitable. Over time, such limitations may lead to degradation of database performance due to less than optimal maintenance.
What is needed is an improved technique for scheduling and prioritizing maintenance for a plurality of database tables.