The present invention relates generally to computer systems and more specifically to a method for measuring performance (execution time) of code sequences on a production system and optimizing such code sequences.
Programmers, hardware developers, and performance analysts have a need to determine the execution time in central processing unit (CPU) cycles associated with particular instructions in a code sequence to determine the optimum way to code the sequence. One method currently used to measure code sequence performance incorporates instrumentation to measure the period of performance for a code sequence. The instrumentation approach requires, however, a stand-alone run for the code sequence to be tested, access to an instrumented machine, and access to reduction tools to calculate performance. In addition, the instrumentation method can result in large errors in cycle time assignment because the instrumentation cannot always determine which operation is executing during any given cycle.
U.S. Pat. No. 5,949,971, issued to Levine et al. and assigned to the assignee of the subject invention, teaches a method for measuring the length of time to execute a serialization of instructions in a pipeline of a processing system, for use in optimizing software. This patent achieves an end result similar to the present invention, namely measuring execution time for a code sequence, by using special hardware and monitors. Thus, the patent is one example of the instrumentation method. The required use of special hardware and monitors is undesirable.
Another approach to performance measurement is disclosed by U.S. Pat. No. 5,671,402 issued to Nasu et al. Nasu et al. disclose a method of counting cumulative clock cycles during simulation of operation of a data processor for executing a program. Nasu et al. do not provide performance measurement on a non-dedicated production system. Instead, the simulation method requires a dedicated system to run the simulation, as well as access to processing algorithms. Dedicated access for simulation can be a problem in typical system applications. Also, the disclosed method cannot provide performance measurement of individual instructions in a sequence or goodness of fit testing. Yet another problem is that the test cases must meet strict requirements to be run and the results must be interpreted. The interpretation of results often involves selections for which the criteria are ambiguous.
The deficiencies of the conventional methods for measuring performance of code sequences show that a need still exists for improvement. To overcome the shortcomings of the conventional methods, a new method for measuring code sequence performance in a non-dedicated system environment is needed.
Therefore, it is an object of the present invention to provide a method for measuring the performance (i.e., execution time) of individual test points, each comprising one or more instructions, in a non-dedicated production system. It is a further object of the present invention to provide a method for measuring the performance of test points, comprising one or more instructions, without relying solely on averaging to determine the execution time of the test point. It is yet another object of the present invention to provide a way to check the goodness of performance measurement data.
To achieve these and other objects, and in view of its purposes, the present invention provides a method for measuring performance of code sequences and individual instructions or groups of instructions within a code sequence on a non-dedicated production system.
The invention uses a test case program and a driver to set up a sequence of instructions (test case sequence) to be measured and systematically determines the number of hardware cycles to attribute to each test point (one or more instructions for which execution time is to be measured) in that sequence. Because optimization in hardware can make an instruction or group of instructions run faster in some specific environments or sequences that are common, the cycle time of each test point can only be computed within the context of a specific sequence of instructions.
The algorithm uses a store clock to determine the number of cycles required for each test point. The (system) clock value is captured at the initiation of the test case program; the test case program is run a preselected number of times (e.g., 1,000 times) for the initial test case sequence (test points 1 through n); and the ending value of the clock is captured and saved to a text file as the first test case sequence time. Then, one test point (test point n) is removed from the sequence of test points to create a second test case sequence, and the second test case program is run again using the new test case sequence (test points 1 through (nxe2x88x921)), again capturing the starting clock value and the ending clock value. The difference between the starting clock value and the ending clock value for the second test case sequence is saved to the text file as the second test case sequence time. The difference between the first sequence time and the second sequence time is attributed to the instruction removed (instruction n), as the number of cycles used to execute that instruction. The process is repeated removing a second test point (test point (nxe2x88x921)) to determine the time for executing test point (nxe2x88x921), removing a third test point (nxe2x88x922) to determine the time for executing test point (nxe2x88x922), and so on until no instructions are left in the test case sequence. Thus, test point n in the sequence has an execution time equal to the time for the test case sequence (test points 1 through n) minus the time for the test case sequence (test points 1 through (nxe2x88x921)).
The present invention provides considerable improvement over the prior art. One key advantage is that performance data can be generated for code sequences in a production environment. Because the execution times are based on minimum times rather than averages, the present invention can account for times when the processor is interrupted during the test case sequence. Another advantage is that the present invention can determine execution time for each test point within a code sequence. This ability is useful in optimizing code sequences, especially in a compiler where a high-level instruction is translated into the most efficient assembler or machine code. By assigning execution time for each instruction or group of instructions, the invention can be used to quantify the benefits of a specific instruction sequence over another one and to predict the overall system performance improvement due to a switch. Also, the present invention can provide a goodness check of the execution time data.
It should be understood that both the foregoing general description and the following detailed description are exemplary, but are not restrictive, of the invention.