The present invention relates to digital data processing, and more particularly to measuring performance aspects of complex data processing systems.
A number of techniques exist for analyzing the performance of software systems. Generally, the performance of different software and hardware systems can be compared using benchmarks. A benchmark, in the context of computer systems, typically includes a performance metric for determining a value that quantifies the performance of a specific aspect of the computer system and a specified set of system conditions that need to be maintained to ensure fair performance comparisons.
One of the first measures of system performance was the time required to perform a single processor instruction such as add or subtract. Because almost all of the instructions of early computers required the same amount of time to execute, the instruction execution time was sufficient to completely characterize the performance of the system.
As instruction sets became more diverse and complicated, instruction-execution profiles were developed, e.g., the Gibson instruction mix (see below). Instruction profiles are lists of the relative frequencies of processor instructions obtained when executing a number of real-world applications. The processor instructions are arranged into classes depending on the number of processor cycles required to execute the instructions. The average instruction time Tavg is then calculated as:
            T      avg        =                  ∑                  i          =          1                n            ⁢                          ⁢                        CPI          i                ·                  p          i                ·                  T          clock                      ,where n is the total number of classes, CPIi is the number of clock cycles required to execute an instruction of class i, pi is the relative frequency of the instruction class, and Tclock is the period of one processor clock cycle. Simplified, the average instruction time for the Gibson instruction mix is:
                                          T            avg                    =                                    ∑                              i                =                1                            n                        ⁢                                          p                i                            ·                              t                i                                                    ,                            (                  Eq          .                                          ⁢          1                )            where ti is the time required to execute an instruction of class i. A lower value of Tavg indicates a better overall performance.
Another approach to analyzing performance is the use of synthetic benchmark programs, such as the Dhrystone and Whetstone benchmarks, which are artificial programs that do no real, useful work. Instead, they are modeled in such a way that their instruction mixes represent the relative mix of instructions observed in real-world applications. The execution time of a synthetic benchmark program is thus representative of the runtime of a real application.
A microbenchmark is a program that has been designed to test only some particular portion of a system. It is not intended to be representative of the whole system and therefore it does not fulfill all the characteristics of a good benchmark.
Finally, application benchmark programs (or application benchmarks) represent sets of standardized real application programs that can be used to analyze system performance for a particular class of applications.
Current software systems can be very complex—they are typically composed of a very large number of interacting agents, and they undergo continuous, highly parallelized improvement and development. As a result, it is often difficult to anticipate the performance effects (be they positive or negative) of even well delimited changes to the software systems, or to detect the causes of creeping performance deterioration. The situation may become more aggravated when different software layers are being developed in different systems (and only deployed into a joint system at a late stage of development). In addition, the situation is worsened when there is a massive granularity mismatch between applications (the performance of which is the important measure in the end) and the underlying elementary software building blocks like the instructions of a virtual machine, and when both the applications and the building blocks experience changes in parallel.