This application relates generally to analyzing software and more particularly to analyzing software behavior.
Computer systems generally include a central processing unit, a memory system, and a data storage system. An enterprise data storage system (EDSS), such as the SYMMETRIX Enterprise Storage Platform (ESP) by EMC Corp., is a versatile data storage system having the connectivity and functionality to simultaneously provide storage services to different types of host computers (e.g., mainframes and open system hosts). A large number of main, physical storage devices (e.g., an array of disk devices) may be used by an EDSS to provide data storage for several hosts. The EDSS storage system is typically connected to the associated host computers via dedicated cabling or a network. Such a model allows for the sharing of centralized data among many users and also allows a single point of maintenance for the storage functions associated with the many computer systems.
Disk drive systems continue to grow in size and sophistication. These systems can typically include many large disk drive units controlled by a complex, multi-tasking, disk drive controller such as the EMC SYMMETRIX disk drive controller by EMC Corp. A large scale disk drive system can typically receive commands from a number of host computers and can control a number of disk drive mass storage devices.
As these systems increase in complexity, so does the user's reliance upon the systems, for fast and reliable access, recovery, and storage of data. Accordingly, the user typically uses data throughput and speed of response as primary criteria for evaluating performance of the disk drive systems. As a result, mass storage devices and the controllers that drive them have become quite sophisticated in efforts to improve command response time. Systems such as the EMC SYMMETRIX disk drive controller system thus incorporate a large cache memory, and other techniques to improve the system throughput.
As the systems grow in complexity, interrupting failures at either the disk drive or at the controller level become increasingly undesirable. As a result, systems have become more reliable and the mean time between failures continues to increase. Nevertheless, it is more than an inconvenience to the user if the disk drive system goes “down” or off-line; even if the problem is corrected relatively quickly, i.e., within hours. The resulting lost time adversely affects not only system throughput performance, but user application performance. Further, the user is typically not concerned whether it is a physical disk drive, or its controller that fails; it is the inconvenience and failure of the system as a whole that causes user difficulties.
Many disk drive systems, such as the EMS SYMMETRIX disk drive system, rely upon standardized buses to connect the host computer to the controller, and to connect the controller and the disk drive elements. Thus, should the disk drive controller connected to the bus fail, the entire system, as seen by the host computer, may fail and the result is as noted above, unacceptable to the user.
As computer systems become more complex, and as businesses rely more upon their computer systems, any performance problem is troublesome, and a performance problem that requires the system to shut down becomes a major and potentially disastrous event.
A failure of or decrease in input/output behavior in, for example, a memory system, could become a bottleneck to efficiency and throughput in the overall operation of the system. Thus, much effort and customer engineering has been directed to being able to resolve problems that occur in the input/output process. A problem may be addressed by taking over the system having the performance problem, determining precisely where and what the problem is by recreating the problem at the customer site, for example by running the applications and data that led to the problem, and then resolving the problem.
Such methods of problem isolation and correction accordingly require the customer's system to be off-line for a period of time, and further can require intensive customer engineer time at the customer site. Aside from being relatively costly in customer time, this method of solving a performance issue can further adversely affect the customer's operations.