During a software development process, code under development goes through a series of different types of testing, typically starting with unit testing and ending with performance/scalability testing. These tests are conducted in a series of environments (system or logical partitions (LPARs)) that simulate a real production system. A system may be described as a node with one or more processors, memory, and Input/Output (I/O) capabilities. A logical partition may be described as a virtual computer that has access to a subset of a computer's hardware resources (i.e., the physical computer may be partitioned into multiple logical partitions with separate operating systems).
The testing is done to simulate a certain set of actions that cover a certain amount of time. For example: a “query and retrieval” test may be set up to simulate some number of users (e.g., several hundred) concurrently accessing the system for a finite amount of time (e.g., 24 hours).
Another test may be a run that “loads/stores” data into the system. The test may be designed such that an amount of data (e.g., 200 GigaBytes (GBs)) is stored within a given time period (e.g., a 3 hour period). This test identifies the system's ability to load data within the given time period.
While all of this testing is good, the problem is that in “real life” there are:
1) a varying number of transaction rates throughout the day, throughout the month and throughout one or more years, as well as, one-time events (e.g., system audits); and
2) a varying number of transaction types throughout the same time periods. For example, the data “search and retrieve” rates may vary over the various time horizons, and the data “load/store” rates may vary over the time horizon.
As a result of such variance, the result is that, over extended time periods, the workload on the production system does not match the workloads induced during testing.
The need to test the workload effects (of one or more subsystem components) on a “production like” system over a long time period (or a replication of a long time period) have become more and more critical. This problem is becoming more severe as system workloads increase with the move to big data and cloud implementations.