This invention relates generally to systems and methods for preparation of workload data from a data storage environment for replaying, and more particularly to a system and method that may access trace data of workload activity produced in a data storage system, prepare it, and then replay the trace data in the same or a different environment for benchmark testing or other reasons.
Testing the workload environment of a data storage environment including at least one data storage system and at least one software application operating on a host computer in communication with the data storage system is a complex task. It often requires that the business have a separate test-bed that contains a duplicate set of hardware where such tests take place. Large companies such as telecommunications companies, airlines, banks, and insurance companies routinely populate a test lab with a large amount of equipment including software applications for emulating production conditions. Other companies rely on vendors providing systems and software to run tests for them but sometimes the various vendors are unable to replicate the myriad of configurations that a particular customer may encounter within their own data storage environment.
The actual execution of application load-tests requires that a copy of the production database(s) be loaded on the storage systems and that a workload driver be created to generate either batch jobs or transactions that attempt to duplicate the production workload. Setup times and the analysis of the test results make such an effort extremely complex and limits such activities to only very few businesses that can afford the time and personnel costs.
The complexity of such a task often reduces these tests to various levels of simplicity where the results do not reflect the actual application. Furthermore, it becomes even more complicated to experiment with alternative configurations and map them onto the production system. Add to this the common requirement to see the effect of multiple applications on the same storage system and the problem is even further compounded.
Data Storage owners who try to shortcut this effort often resort to general-purpose Input/Output (I/O) drivers that are available in the marketplace. Such drivers do not attempt to duplicate an existing workload. They simply provide the user with the ability to specify a specific stream of I/Os to specific data volumes or logical devices.
It would be an advancement in the computer arts, and particularly the data storage arts to have a solution that could duplicate a workload in a data storage environment but would reduce the complexity of existing systems. Further, if such a solution significantly increased the accuracy and flexibility of such tests that would also be a significant advantage over prior art techniques.
One area wherein duplicated workloads are useful is that of benchmark testing. But prior art benchmarking approach in storage industry has been running static (i.e., canned), idealized, uniform IO workloads. However, in many cases these benchmarks have no bearing to the actual environment on which benchmark results are desired. It would be an advancement in the arts to provide an invention with a new methodology for benchmarking storage by replaying exact IO trace of customer traces in different storage hardware and software platforms. It would be a further advancement if such a solution could customize the benchmark workload based on customers real production workload.
It would also be an advancement in the computer arts if an invention having the advantages above was also capable of being used comparing alternative algorithms from a performance perspective. It would also be advantageous if such an invention could be used for consolidation and capacity planning, i.e. allowing engineers to size new implementations with workload data collected from existing storage implementations.
Further it would be advantageous to have an invention that could be used for problem recreation and troubleshooting by recreating the problem workload and carrying out various xe2x80x9cwhat-ifxe2x80x9d scenarios.
To overcome the problems of the prior art mentioned above and to provide advantages also described above, this invention is a system and method for preparing captured traces of workload data for replaying that duplicates or selectively varies a workload scenario operating in a data storage environment.
The method includes preparing a trace of workload activity experienced on one or more data storage volumes included with a first data storage system, for playing a replication of the trace of workload data on one or more data storage volumes included with a second data storage system. The first and second system can be the same or a different system, i.e., the workload activity is replayed on the same or a different system from that on which it was captured. Preferably the workload activity is accessed in the form of I/O activity.
In another embodiment, a system is provided that is configured for performing the steps of preparing a trace of workload activity experienced on one or more data storage volumes included with a first data storage system, for playing a replication of the trace of workload data on one or more data storage volumes included with a second data storage system.
In another embodiment, a program product is provided that is configured for performing the steps of preparing a trace of workload activity experienced on one or more data storage volumes included with a first data storage system, for playing a replication of the trace of workload data on one or more data storage volumes included with a second data storage system.