1. Field of the Invention
The present invention generally relates to data processing, and more particularly, to scheduling the performance of units of work in a data processing system.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.
Regardless of the particular architecture, a DBMS can be structured to support a variety of different types of operations for a requesting entity (e.g., an application, the operating system or an end user). Such operations can be configured to retrieve, add, modify and delete information being stored and managed by the DBMS. Standard database access methods support these operations using high-level query languages, such as the Structured Query Language (SQL). The term “query” denominates a set of commands that cause execution of operations for processing data from a stored database. For instance, SQL supports four types of query operations, i.e., SELECT, INSERT, UPDATE and DELETE. A SELECT operation retrieves data from a database, an INSERT operation adds new data to a database, an UPDATE operation modifies data in a database and a DELETE operation removes data from a database.
In some environments, it is desirable to schedule queries. Scheduling queries allows users to specify specific times and/or frequencies for running queries. Query schedules are appropriate in environments where the underlying data is constantly changing (i.e., being updated or augmented with additional data). For example, a researcher runs one query to get an initial list of candidates for a research study to start the research process. But over the coming weeks to years, the researcher wants to know if other people develop conditions that would satisfy the query and therefore make them candidates for similar research. Similarly, summary tables or normalized values might be periodically updated via expensive SQL operations. Thus, some classes of queries are scheduled to be run multiple times over long periods of time.
While the ability to schedule queries is a useful tool for users, indiscriminate scheduling can result in substantial system performance degradation. Left in isolation, a scheduled system can easily tend towards chaos. As more queries get scheduled, the system's performance becomes more unpredictable. It is common to deal with this situation by having an administrator determine why the system has become unresponsive and selectively terminate queries identified as being problematic. Alternatively, to prevent any one user or group of users from destabilizing a system, query execution limits are often placed on users. For example, a specific user(s) may be limited to running queries that take less than 20 minutes to run, or is not allowed to run queries that can consume more than 20% of the CPU power at a time. Typically, these rules are administrative actions enforced at runtime and commonly the action taken is to terminate any offending query.
The problem with the foregoing approach is that the users/administrators of a system have to recover from the system action. The ramifications include lost time too late in the process, frustration on the part of users, and lost profits by the system owner who is likely paying the users running the queries. Therefore, what is needed is a more intelligent approach to scheduling units of work, such as queries.