In the late 1980's, database designers began to borrow parallel processing techniques from scientific super-computing to meet growing demand for higher performance. Since 1993, database vendors, such as Oracle's parallel server and parallel query initiatives, have enabled this migration by adapting their products to make parallel application implementation easier for users. A parallel architecture is superior to a traditional single processor because the workload is divided into multiple simultaneous processes for faster execution. The same technique is used to speed access to disk storage. When additional capacity or performance is needed, the architecture is scaled by adding additional resources to those already in place, allowing for more parallel processes to occur simultaneously.
Additionally, the cost of hard disk storage has plummeted in the last few years. As a result, database applications requiring very large amounts of storage which would have been prohibitively expensive only a few years ago are now commonplace. A database of 20 gigabytes (GB) of storage would have been considered a very large database (VLDB) in 1993, but now database applications requiring terabytes of storage are becoming commonplace.
Microprocessor power doubles every year or two. Performance of scalable parallel server architectures combining multiple processors is growing even faster. In contrast, while the capacity of disk storage devices has increased greatly, the improvement in hard disk access times has been comparatively small. Disk performance development is constrained by mechanical rotational and seek latencies. Disk drives have moving parts, and mechanical movement takes time. High performance processors have cycle times in nanoseconds while the fastest disk drives have access times measured in thousands of microseconds or more--a factor of six to seven orders of magnitude larger.
Currently the best performance that can be expected of a single disk drive in database application is 50 to 100 access per second. Accessing a single item in a database may require several disk reads, and it is not uncommon to find high performance computing resources idling, waiting for the mechanical components to seek, rotate and find the information requested by a process. Disks involved in one step of a process must often wait for other disks accessing data for another step.
Disk caching is of limited use as a solution to this problem. Caching provides performance advantages where access to stored data can be predicted. In a typical database application, a single transaction may require 50 to 100 disk accesses within a few seconds involving a couple of dozen database objects, such as tables, indexes, logs, etc. These accesses typically retrieve relatively small amounts of data randomly distributed through the database structure. In these types of applications, disk caching provides little or no improvement in the speed of access. Accessing such data serially takes too long for many database applications which require near instantaneous response to a human operator.
Designers of VLDBs have addressed this problem by storing data on multiple drives and by distributing the data across the drives in a manner that reduces the probability that a single drive will need to be repeatedly accessed to retrieve a database object. For example a 23 GB database stored on a single 23 GB drive would be limited to 50 to 100 accesses per second. A typical on-line transaction processing (OLTP) operation would require this performance just to support one operator. By storing the data on 12 smaller 2 GB drives, the effective performance of the database can be increased. For optimally placed data, the improvement in throughput can approach the factor of 12 by which the number of drives was increased.
The process of storing database objects across multiple drives to increase performance is called "striping." Optimal or near-optimal physical placement is important to improving VLDB performance by striping database objects across a large number of drives. Experienced database administrators (DBAs) stripe active database objects to keep related tables, indexes, redo logs, swap files, etc. separated in order to speed up parallel access. This is not much of a challenge for a modest system with few objects and disks, but difficulties expand exponentially as database size grows to 20 GB and larger. As a database gets larger, its components are often haphazardly placed on existing drives, simply being placed where space is available. This leads to hot-spots (over-loaded disks) that are hard to detect and impossible to predict. Worse, in a disk-striped environment, the effect of a single hot-spot can spread to affect many objects, robbing the whole system and all its processes of performance.
Diagnosing these types of problems is not easy. When dozens or hundreds of related database objects must be optimally placed on a large disk array that may include tens or hundreds of drives, the number of possible arrangements of the database objects is huge and visualization of solutions becomes impossible. Because of the large number of possibilities, trial and error methods are inadequate. Frequently, these databases are in use almost all the time, and the access of a DBA to the database for experimentation and analysis may be limited. Accordingly, there is a need for tools which will aid a DBA in analyzing the operation of a active VLDB to enable the DBA to optimize data placement and performance. The present invention provides such a tool.