There is an on-going need to provide quantitative evaluations of computing systems to assess characteristics such as performance, dependability (e.g., H. Madeira and P. Koopman, “Dependability benchmarking: making choices in an n-dimensional problem space.” First Workshop on Evaluating and Architecting Systems for Dependability, Göteborg, Sweden, 2001.), security, and configurability (e.g., A Brown and J. L. Hellerstein, “An Approach to Benchmarking Configuration Complexity, SIGOPS, 2004, or U.S. patent application Ser. No. 11/205,972, filed Aug. 17, 2005, entitled “System and Methods for Quantitatively Evaluating Complexity of Computing System Configuration”).
Quantitative assessments provide statistics such as response times for performance, failure rates for dependability, intrusion probabilities for security, and configuration complexity for configurability. The statistics resulting from these evaluations are used in many ways including making decisions about hardware and software purchases and vendor internal assessments of alternative designs.
A common approach to such evaluations is to run benchmarks against production systems. For example, the Transaction Processing Council has developed a set of performance benchmarks for web, database, and other applications. U.S. Pat. No. 5,245,638 (“Method and System for Benchmarking Computers”) suggests that the important elements of a benchmark are storing instructions to execute, timing benchmark runs, and storing the statistics produced.
A benchmark includes several components. The system under test (SUT) is the system being measured. Typically, the SUT consists of product level hardware and software that are assembled for the purpose of running the benchmark. There is considerable cost associated with creating a SUT for a benchmark.
The second component, the benchload generator, provides the work or disturbances needed to assess the SUT characteristics of interest. In a performance benchmark, the benchload generator is a workload generator that creates synthetic requests such as for a web server. In a dependability benchmark, the benchload consists of component failures that are induced. Considerable care is required in the construction of the benchload generator so that the work or disturbances it creates are representative, consistent, and appropriate for assessing the SUT characteristics of interest. Indeed, many external bodies such as the Transaction Processing Council and the Standard Performance Evaluation Corporation (SPEC) maintain detailed specifications for performance workload generators to ensure consistency in benchmark measurements.
Traditionally, benchmarks are configured and run in a standalone manner, and they produce a narrowly focused set of statistics about the SUT. For example, the SPECJAppServer benchmark requires a SUT consisting of a web server running on a Java Virtual Machine with appropriate programs (servlets, Enterprise Java Beans) and a database server loaded with appropriate database tables. The benchload generator creates web requests, and statistics are produced that measure the performance of the SUT in processing these requests. To provide information on the dependability of the SUT, a separate benchload generator must be used with potentially different configurations of the SUT.