Data processors typically execute a series of operations to execute instructions. For example, execution of a single instruction can result in access to an instruction cache, a translation lookaside buffer (TLB), external memory, a data cache, and other portions of the data processor. Each operation can result in one or more performance events, such as a cache miss, a TLB access, and the like. In addition, performance of the data processor can typically be improved by increasing the efficiency of execution of operations resulting from a series of instructions. For example, a data processor can be made more efficient by increasing the number of times that a series of instructions is able to retrieve data from a local cache rather than external memory. Accordingly, it can be useful for programmers to know which instructions cause particular performance events. However, because each task of the data processor can require the execution of thousands or even millions of instructions, it can be difficult to determine which performance events occur during execution of individual instructions. Accordingly an improved performance monitoring technique is needed.