Code profiling involves sampling instructions of a target computer program executed in a processor, and building a statistical profile for the instructions. The statistical profile typically indicates which program instructions of the target computer program have executed and the frequency of execution for those instructions. A programmer analyzes the statistical profile to determine whether certain program instructions have executed, which portions of code executed most frequently, and where bottlenecks are present in the execution of the target computer program. The portions of the target computer program that are executed most frequently are then typically optimized to improve the overall execution time of the target computer program.
The instructions are typically sampled by reading the addresses of the instructions as they are in the process of execution. In one method of code profiling, a program counter is sampled to read the address of an instruction as the instruction is fetched from a memory. A count of each sampled address is kept and updated while the target computer program executes. The sampled addresses are associated with instructions in memory and a statistical profile is then built for the instructions by using the counts of the sampled instructions.
The statistical profile built by sampling the program counter for fetched instructions may not accurately reflect the instructions executed on the processor because some instructions that are fetched from memory do not complete execution in the processor. For example, instructions executed in pipelined processors may not complete execution because of exceptions, branches, and execution mispredictions that occur during execution of the target computer program. If instructions are sampled while being fetched from memory but do not complete execution, the count and statistical profile will not accurately reflect the instructions actually executed by the processor. Instructions should only be sampled after they actually execute, but an instruction in a pipelined processor may terminate without actually executing as late as a last stage of the instruction pipeline. Only instructions that have successfully passed through all instruction pipeline stages should be placed into a profiling register where they can be sampled to build the statistical profile.
In another method of code profiling, a chain of program counters is formed by duplicating the program counter for each stage of the instruction pipeline. The chain of program counters is a shift register that is synchronized with the instruction pipeline stages. The program counter chain maintains the address of the instruction as the instruction executes in the instruction pipeline stages. The addresses of instructions shift through the program counters as instructions move through instruction pipeline stages during execution of the instructions. The last program counter in the chain is sampled, or copied into a hardware profiling register and sampled, only after the instruction in the last stage of the instruction pipeline has completed execution. The resulting statistical profile more accurately reflects the instructions executed than a statistical profiled formed by a method that samples addresses from the program counter while instructions are fetched from the memory.
The addition of a program counter chain to a pipelined processor consumes area in the processor and may impact performance of the processor. In larger pipelined processors having instruction execution prediction resources and high clock rates, the area consumed and performance cost is unacceptable. Additionally, the function of the program counter chain is complex and prone to bugs because many execution slots in the instruction pipeline may not have active instructions or the location of an instruction may slide in the instruction pipeline while the rest of the instruction pipeline is stalled. In particular, the addition of the program counter chain may impact the operating frequency of the processor.
It is with respect to these and other considerations that have given rise to the present invention.