Optimizing an application's overall performance on a given processor micro-architecture can be difficult. Challenges include ever-growing processor micro-architecture complexity, workload diversity, and a large volume of data produced by performance tools. Typical processors may include functionality to provide performance data, such as by counting the occurrence of micro-architectural events to characterize and profile the performance of application code. However, the functionality provided by some processors may be inadequate to provide accurate event data for some types of events that occur relatively frequently such that information relating to the events has a relatively short life span in the processor. For instance, with respect to instruction retired events and branch retired events as several nonlimiting examples, a delay between the occurrence of the event and recording of the state of the processor may result in the event being attributed to a section of application code occurring multiple cycles after the section of code that actually corresponded to the event.