Conventional database management applications organize and store large amounts of data according to a predetermined normalization and indexing manner to allow efficient update and retrieval by a user. In a typical relational database, the data is organized according to tables on a logical level, and further subdivided into physical disks, files, extents and segments. Further, a particular database operation often includes multiple tables via joining or linking the tables by using key fields. Also, a single table may be spread across multiple disks or files, due to volume or access contention constraints.
Accordingly, a single database query may invoke a number of database I/O requests to a plurality of database resources, such as disks, files, tables and extents. Further, I/O requests triggering a physical device access tend to be particularly burdensome operations in a database management application because of disk seek and latency time. Contention among multiple users attempting to simultaneously access a database, therefore, can result in contention for the common database resources, such as disks and files, resulting in a bottleneck for access to the common resource.
While proper database design and resource allocation purports to balance the expected demand load, at least initially, database contention and the resulting bottlenecks represent an ongoing maintenance issue. Changes in the number of users, increases in the quantity of data stored, and the method of access (i.e. LAN throughput, remote Internet access) can affect the demand load placed on database resources. Further, disks and files can become fragmented and extended over time, thereby causing a table or file to migrate to different physical areas, and increasing the likelihood of incurring additional read and latency access time.
Accordingly, conventional methods are known for tracking database access attempts and providing output indicative of database operations. Conventional systems employ event logging, log files, media utilization graphs, high water marks and CPU utilization graphs to track database usage and isolate potential or actual bottlenecks. These conventional methods typically provide a graphical or textual output format that an operator or field service technician can interpret in an attempt to assess database resource contention.
Conventional database analysis methods suffer from a variety of deficiencies. In general, conventional methods typically generate output that is too voluminous and unweildly to be analyzed effectively, or are prohibitively intrusive such that normal database throughput suffers from the monitoring overhead. In particular, the methods outlined above tend to generate log files, which dump an indication of each database access attempt. A typical conventional event logger or log file will generate a very large text or other type of file identifying each transaction over a data gathering period. Typically, these conventional files contain extraneous data such as system operations and access to tables or files which are not the subject of the analysis, in addition to pertinent database table accesses. Additionally, the conventional systems perform a subsequent analysis operation of the raw data that imposes a lag time on the output result, hindering any ability to obtain real time feedback on database performance.
Often, conventional database tracking entries are written with such frequency that the CPU overhead required hinders overall system performance. Conventional graphical analysis, such as CPU utilization graphs and disk utilization graphs, can also entail substantial overhead. Also, during processing of conventional database statistics systems, other computer system activities tend to affect CPU usage in addition to access to the database tables or files for which information is sought, thereby skewing the results of such a CPU or disk graph.
To illustrate an example of deficiencies posed by conventional database analysis methods, consider an operator establishing a log file for access to a database table. The operator designates an hour of log time. The operator is focused on database accesses to a certain table, but many tables are frequently accessed in the logged database instance. Consider further that each user access transaction results in an acknowledgement from the disk and a confirmatory update to an index. Accordingly, the conventional logging process generates three entries for each access transaction for all tables, resulting in a large, unwieldy log file.
The operator can access the resulting unwieldy data in the log file several ways using conventional systems. One conventional technique involves manual inspection of the log file by table name and may yield the transactions initiated for the particular table, but the operator will need to examine many other entries and may inadvertently skip entries in the voluminous hardcopy while scanning for the proper table name. A conventional parser could analyze the log file automatically, but the operator must manually develop the procedure to parse the log file and look for the table name. The operator may be able to modify the conventional logging procedure to selectively log certain entries, however, this approach also requires manual coding of procedures.
Embodiments of the invention are based in part, on the observation that it would be beneficial to provide a database performance gathering and analysis tool to retrieve database requests without gathering substantial extraneous data and without unduly burdening the database or the CPU with the resources required to execute the tool itself. Configurations of the present invention significantly overcome deficiencies with the above conventional methods and provide such a solution. In particular, embodiments of the invention provide mechanism and techniques that include a method for processing database performance statistics that includes periodic sampling of pending database requests, rather than exhaustively monitoring and capturing all database access traffic, to identify areas of contention. The sampling is done in sample/sleep intervals that occur for a predefined time period such as 20 seconds for each database instance. The cycle of sampling different database instance can repeat, for example, every two minutes for a total sampling time of 30 minutes. By using a unique embedded set of sample sequences for different instances of a database, embodiments of the invention can obtain an accurate indication of performance bottlenecks to various database resources of different database instances.
During this sampling process, the system of the invention periodically samples or scans a database access queue to gather samples of pending requests corresponding to database transactions. An aggregating component receives the sampled requests and aggregates the samples with previous samples corresponding to the same transaction. Correlating the aggregated samples identifies transactions that have been pending the longest and identifies database objects, such as files, tables and segments, which have a relatively high number of pending transactions. By periodically sampling, rather than exhaustively logging all requests, embodiments of the invention significantly reduce or minimize CPU intrusiveness and significantly eliminate trivial and benign transactions from the output. Further still, embodiments of the invention identify the most burdened database objects to enable a database administrator to make informed decisions about remedial actions to correct database performance issues. Also, by sampling using a sampling structure that is then xe2x80x9cdumpedxe2x80x9d out to the aggregating structure, continuously pending transaction progress can be tracked over multiple sample iterations.
The database performance gathering and analysis tool of this invention therefore substantially pinpoints areas of contention in the database, allowing a database administrator or other operator to pursue quantitative and deterministic remedial actions, rather than trial-and-error load balancing or priority adjustments based on perceived visual trends in voluminous amounts of data. The data so gathered can be targeted so as to collect information about tables, DB instances, particular users, or other granularity so as to enable database administrators detect patters of usage and trends for tuning the database accordingly.
Specifically, one particular configuration and embodiment provides a toolkit that performs a method for gathering and analyzing database performance events comprising sampling, at a predetermined sampling time interval, metrics of database performance statistics (database performance statistics) indicative of pending database requests. The method further includes building, for each of the pending database requests, a scope structure including the sampled database performance statistics in the pending database requests to generate a set of snapshots of database requests. A global accumulator accumulates transaction entries indicative of the database transactions, and accumulates the scope structures of database requests to correlate the current pending database requests. A transaction ID identifies current pending transaction with transaction entries in the global accumulator corresponding to previous pending database requests for the same transaction.
Embodiments of the invention further generate the scope structure snapshots from queues of database requests. The sampling operation comprises scanning a queue for the pending database requests, and reading database performance statistics for the entries in the scanned queue. An aggregating component employs a transaction identifier indicative of the pending transaction and correlates, by indexing into the global accumulator, the pending transaction to determine if a transaction entry in the global accumulator corresponds to the pending transaction. The aggregating component then creates, if a corresponding transaction entry is not found in the global accumulator, a transaction entry corresponding to the pending transaction.
If a corresponding transaction entry is found, aggregating further includes comparing database performance statistics corresponding to the same transaction to compute aggregate parameters for the database performance statistics corresponding to the same transaction, such as the queue wait time and the number of pending requests in the queue.
At predetermined report intervals, typically 30 minutes, a report archive receives the accumulated transactions in the global accumulator, and a report output writer generates a report indicative of the analysis.
There may be several database instances analyzed simultaneously, each having a plurality of queues, wherein the sampling further comprises sampling a subset of selected queues from a plurality of queues corresponding to the database instances. The gathering component samples iteratively according to a predetermined sampling frequency cycle indicative of a periodic window in which to retrieve the database performance statistics. The samples are separated by sleep time interval such that the gathering component retrieves, for a predetermined sampling time, the database performance statistics. Following the sampling time, the gathering component idles for the duration of the sleep time interval such that database performance statistics are not retrieved during the idling to minimize CPU intrusion.
A data collection policy (DCP) specifies the predetermined intervals for the sampling, building, and the aggregating to mitigate the intrusiveness of the sampling. The DCP specifies the predetermined intervals such that transactions having a substantial impact on database throughput are identified in the scope snapshot and transactions having an insubstantial impact on database throughput are omitted from the scope structure snapshot. Accordingly, the DCP directs the system to provide real time analysis of data on a variety of granularity levels (e.g. per device, file, table or segment) over a variety of timing intervals (e.g. hourly, daily, weekly).
Therefore, the DCP is indicative of a series of nested timing intervals, in which the steps of sampling, building and aggregating repeat according to an instance sampling sequence and for a respective database instance. Each instance sampling sequence includes successive samples taken iteratively from the DB instance during a sampling frequency cycle. The global accumulator then indicates performance for transactions occurring during for the instance sampling sequence for that respective database instance. The instance sampling time sequence repeats for multiple database instances during a database sampling frequency cycle, thereby allowing collection of samples of database performance statistics for multiple database instances, and for respective database objects, and further, occurs repetitively over a nested database collection interval.
An output report, such as a histogram, graphically illustrates an aggregate result of the sampled database performance statistics to allow a user to identify performance characterized by server configuration, or database files or objects. A report writer coalesces aggregate parameters for multiple transactions for respective database objects from the global accumulator to produce an aggregate result of transaction performance for each respective database object for which performance characteristics are sought according to the DCP.
Embodiments of the invention include software programs that carry out the operations explained herein, as well as apparatus such as a computer system configured to perform the operations disclosed herein as embodiments of the invention. The invention may be embodied as software alone, or as software and hardware, or as a process executing or otherwise performing within a processor in a computer system.
It is to be understood that the system of the invention, although described above as a software toolkit, can be embodied strictly as a software program, as software and hardware, or as hardware alone, and may be implemented as a standalone process or set of processes, or linked and merged as part of a larger system. Example embodiments of the invention may be implemented within EMC""s Control Center software application, marketed commercially from EMC Corporation of Hopkinton, Mass., which provides graphical management functionality for storage area network resources and in computerized devices that operate the Control Center software.