The ability to monitor a database server is an important aspect of database administration. Monitoring is desirable in a variety of scenarios such as detecting performance problems, bottlenecks, or abnormal conditions as well as auditing usage of system resources and tracking trends for capacity planning. In most of these scenarios, monitoring should be accompanied by an ability to take some action such as correcting a performance problem or generating a report of system usage. The need for automated monitoring becomes greater as databases are used by more complex applications with advanced configurations. Some automated monitoring systems can reduce total cost of ownership and increase efficiency of a database manager (DBA).
Today's commercial database systems have support for gathering counters about system usage such as CPU usage, number of active connections, currently executing queries, locks held by currently executing threads, etc. These counters are exposed to clients using two basic mechanisms: polling and event recording. Polling allows a snapshot of these counters at various points in time by polling the server. Event recording enables counters associated with a system event to be written into a file or table. Events of interest include SQL statement execution begin and end, lock acquire and release, and user login and logout. With each event several counters associated with the event may be recorded such as: time, database identification, an application that causes the event, and/or duration of the event. While these mechanisms are commonly available in conventional systems, they have drawbacks. If polling is performed infrequently, then the monitoring application can lose valuable information. On the other hand if polling is performed very frequently it can impose significant CPU overheads on the server. Similarly, event recording (although not lossy), can incur significant overhead on the server since a potentially large number of events needs to be written out to the file or table or sent over the network. The following examples of common monitoring tasks illustrate these drawbacks.
A common task for DBAs is to detect “outlier” invocations of a stored procedure (P) that are much slower to execute than other instances of (P). This is useful since the DBA can later analyze these outliers to determine reasons for underperforming. Similarly another common task is to detect total delay due to blocking caused by update statements issued by applications over a measured interval of time. This task can be helpful in detecting locking problems due to poor application design or unanticipated interactions across applications. For both of these tasks, event recording would cause a very large volume of monitored data to be written out by the server including stored procedure completion events in the first example and blocking/release events in the second example. The large amount of data is written even though the amount of data that the DBA needs to review is considerably smaller. If the example tasks are performed using a polling approach, the answers obtained by infrequent polling may be compromised if outlier occurrences are missed. On the other hand, very frequent polling incurs significant load on the server due to repeated traversal of a lock resource dependency graph to detect total delay due to blocking.
Typical commercial database systems have support for event logging as well as for obtaining a snapshot of system counters by polling the server. IBM DB2 Health Center is a tool that continuously monitors “health” of the database system and alerts DBA by email or pager or by logging the problem. An alert is raised when the value of the counter being monitored crosses a threshold. There are several third party monitoring tools for today's commercial database systems with similar functionality. These systems are client side applications that are added to the database server by the client. Some database systems feature inbuilt support for triggers in database systems or provide general purpose notification services. These mechanisms provide limited monitoring features and flexibility while incurring substantial overhead on the database system. Also, because these systems often obtain information about database events after the events have occurred, these systems have only limited ability to alter database functioning as monitored events are occurring.