This invention relates to the field of analysis of computer program code executing on a processor. In particular, the invention relates to non-intrusive monitoring and profiling of the time occupied by a processor executing various instructions and portions of programs, particularly application programs.
Processors, particularly microprocessors, are widely used today in a vast variety of applications to perform information processing applications. One area, in particular, in which processors today proliferate and perform a central function is in digital signal processor (DSP) systems. In such systems, data representing physical signals such as voice or video signals or medical telemetry or instrumentation input are digitized and then subjected to various algorithms to extract information or to create new information or new signals, or to transmit information. There is an ever present market pressure to improve the operation of such DSP-based systems, to make them faster, less expensive, etc. The achievement of increased speed not only results in a focus on hardware improvements, but also on achieving a more efficient utilization of processor hardware. Consequently, one step in the process of developing application software (computer programs) to run on digital signal processors or, indeed, processors in general, is to attempt to evaluate the efficiency with which the system is operated by the software, to determine what parts of the program consume the most processing time such that the improvement thereof is most likely to improve significantly the performance of the system.
Methods commonly used to collect performance data on processor execution of program code generally have an effect on the system under test. For example, they may require the insertion of breakpoint instructions in the program code at various points so that the progress of a program may be measured from breakpoint to breakpoint. Stated another way, the conventional approach requires the program compiler and assembler (via user-installed directives) to add instrumentation code (breakpoint instructions) at all entry and exit points of subroutines. This additional instrumentation code can be used to capture the current count of the processor""s program counter on subroutine more generally, (process entry). On exit, it would update a record with various information about the subroutine""s (process"") operation (for example, the number of times the subroutine (process) has been entered, the total time spent in the subroutine, the maximum single time in a pass through the subroutine, the minimum single time in a pass through the subroutine, and so forth). Using this information, the user can reach some conclusions as to which routines (processes) are consuming the most time on the processor and, thus, identify subroutines whose improvement is most likely to yield significant overall improvements in program performance. Of course, one drawback is that the xe2x80x9ctargetxe2x80x9d program code has to be modified, which means the system performance running the instrumented code is different from its performance running the uninstrumented code. However, altering the code and the natural flow of the program under test inherently results in production of inaccurate data since the program analyzed is not the actual program as it will exist in normal usage. Some of the dynamics of the production version of the system may be lost. As a result, particularly on systems that are carefully adjusted for best performance (i.e., typically referred to as xe2x80x9chighly tunedxe2x80x9d), the resulting inaccuracy can result in incorrect reports as to the parts of the system that need additional tuning.
Additionally, some of these systems strive for near 100% processor utilization. In such situations, the insertion of additional instructions into the program code to collect performance data, actually can make the system non-functional.
Accordingly, a need exists for a method and apparatus which provide non-invasive monitoring of a processor as it executes program code, without requiring that program code to be altered (e.g., additional instructions added).
The foregoing needs are addressed, and advantages obtained, by the use of a statistical profiling method which non-intrusively samples the processor""s program counter in a random manner. The collected data allows a system developer to analyze (even visualize) each routine""s utilization of processor time, so the developer can identify which routines are consuming the majority of the processor""s performance and optimize those routines. The random sampling is obtained by taking advantage of a shift register commonly provided on processors, particularly DSPs, that have an industry-standard port known as a JTAG port. JTAG ports conventionally are provided for use in so-called boundary scanning operations.
If the sampling is not performed with random timing relative to the times at which the program counter contents change (i.e., timing independent of the processor core clock), the information collected could inaccurately show certain program counts or routines executing more or less often than actually is the case. This may occur when the sampling is somehow synchronized with the program execution. To avoid this, a system according to the invention typically uses, at least, a clock which is independent of the processor core clock to effect the sampling of the program counter contents.
To provide a greater assurance of randomness, two different methods may be employed to effectuate random sampling. First, the clock used for the register in the sampling port operates independently from the clock that operates the processor, as described in the preceding paragraph.. Second, an external host computer (e.g., a personal computer or workstation) is used to signal the sampling port as to when to sample program counter. The signal from the host is established asynchronously from the clock in the sampling port. This process adds random delays between the command to sample and the initiation of sampling. The command to effect sampling may specify or persist for a defined interval and then cease for another interval, allowing multiple successive samples and giving a xe2x80x9cburstyxe2x80x9d quality to the sampling process, if desired. Or only one sample may be taken at a time.
According to a first aspect, the invention involves a system for monitoring a processor when it executes software code for a computer program. The system includes a register, operatively connected to the processor, that collects information regarding instructions executed by the processor; and a sampler, operatively connected to the register, that asynchronously from the operation of the processor, samples contents of the register. The system may include a host computer receiving said samples and providing a statistical record of the instructions executed by the processor.
According to a second aspect, the invention involves a system for monitoring a processor when it executes software code for a computer program, and includes: a register, operatively connected to the processor, that collects information regarding instructions executed by the processor; a sampler, operatively connected to the register, that asynchronously from the operation of the processor, samples contents of the register; and a host computer that issues commands to initiate operation of the sampler to sample the contents of the register. The host computer may receive said samples and provide a statistical record of the instructions executed by the processor.
According to a third aspect, the invention involves a system for monitoring a processor when it executes software code for a computer program, the processor being clocked by a first clock and including a program counter also clocked by said first clock, comprising: a register operatively connected to the program counter and clocked by the first clock and receiving the contents of the program counter synchronously with the first clock except when disabled; and a latch operatively connected to the register to sample the register contents in response to a second clock which is independent of the first clock. There may also be provided logic adapted to disable the register while it is being sampled by the latch. A shift register may be included, operatively connected to receive the contents of the latch and communicate them to a user device. There may also be included a host computer clocked by a third clock which is independent of the first clock and the second clock; and an interface unit coupled between the host computer and the latch and arranged to communicate sampling commands from the host computer to the latch to initiate sampling by the latch. The interface unit may issue to the latch commands to sample synchronously with the second clock. The latch may be controlled at times to receive in sequence from the register a series of processor instruction contents from the program counter. Preferably, but optionally, there also may be included logic which disables the register when conditions for execution of an instruction in the program counter have not been satisfied.
According to yet another aspect, the invention involves a method for monitoring a processor when it executes software code for a computer program, the processor being clocked by a first clock and including a program counter also clocked by said first clock, comprising: capturing the instruction contents of the program counter in a register, non-invasively, synchronously with the first clock, except not capturing the instruction contents for instructions which will not be executed; and sampling the register contents in response to a second clock which is independent of the first clock. Such a method may further include disabling the register while it is being sampled. Sampling may include delivering the program counter contents to a shift register operatively connected to receive the contents of the latch and communicate them to a user device. Preferably, the shift register is in a JTAG port. The user device can be a host computer and the host computer can be operated to generate a statistical analysis of the captured program counter instruction contents. Optionally, the method may further include clocking the host computer by a third clock which is independent of the first clock and the second clock; and using an interface unit coupled between the host computer and the latch, communicating sampling commands from the host computer to the latch to initiate sampling by the latch.
The method also may include the interface unit issuing to the latch commands to sample synchronously with the second clock and the latch responding after a delay related to the difference between the second and third clocks and processing delays. The method may include controlling the latch, at least at times, to receive in sequence from the register a series of processor instruction contents from the program counter. It also may include disabling the register when conditions for execution of an instruction in the program counter have not been satisfied.
According to still another aspect, the invention involves a method for monitoring a processor having a program counter, when it executes software code for a computer program, comprising: non-invasively collecting information regarding instructions executed by the processor, from the program counter; and asynchronously from the operation of the processor, sampling the collected information. This method may further include providing the samples to a host computer via a shift register in a JTAG port; in turn, the host computer may be operated to provide a statistical record of instructions executed by the processor.