1. Field of the Invention
The invention relates generally to computer-implemented database systems, and specifically, to an automatic error recovery mechanism for a database system.
2. Description of the Related Art
During the processing of queries in a database system, a substantial number of errors, problems or failures may cause the system to cancel the query. In today's environment (i.e., from a user's point of view), such failures are a constant source of frustration and delay.
Problems get reported to the vendor's customer service personnel, who investigate each instance and, for many instances, create incident reports. Such incident reports are then forwarded to the vendor's development personnel, who may take some time to respond to the incident report and resolve the customer's problem. Indeed, some incidents may not get responded to and some problems may not get resolved for extended periods of time.
Often, a workaround is available (e.g., by deactivating or activating certain components, features or code paths), but it may take a substantial period of time to communicate the workaround from the vendor to the customer, so that the workaround can be implemented. Indeed, there may be situations where the workaround could be automatically implemented by the database system itself, in a real-time environment, and without the intervention of a user, database administrator (DBA), or other personnel. Such workarounds can be used for long periods of time, even across several releases or updates of the system, thereby allowing the system to provide for better query plans (i.e., query plans that execute without faults). Moreover, workarounds could be manually or automatically disabled, once a “fix” is implemented, thereby avoiding having the components, features or code paths being deactivated or activated for long periods of time.
What is needed then, is a database system that can automatically or manually activate and/or deactivate components, features and code paths through the analysis of diagnostics, which may result in errors, problems or failures being bypassed.
The present invention provides such a system, wherein an active system management capability can resubmit a query following its execution failure, but using a different set of components, features or code paths than the previous set of components, features or code paths that resulted in the failure. Moreover, this active system management capability can be used to alert users, DBAs and other personnel, including vendor personnel, of potential problems or potential issues in components, features or code paths, and communicate ways to avoid those problems or issues.
These and other aspects of the present invention are described in more detail below.