The present disclosure relates generally to microarchitecture, and, in particular, to techniques for efficient data gathering from trace arrays.
A major problem in processor design is keeping usage of hardware resources to a minimum. In a number of designs, trace arrays are used to facilitate performance monitoring. The trace arrays provide data that is used for performance analysis. This data needs to be read out by firmware in regular intervals to prevent the trace arrays from overflowing (overwriting old data with new before the old data has been saved). Thus, the process of reading has to run at a reasonable speed to allow gathering the instrumentation data in a number of cycles that does not disturb the running measurement.
Efficient implementation of trace arrays is complicated by the fact that trace arrays are usually spread around on within the microprocessor, and typically located as near as possible to the source of the captured signals. Current microprocessors can easily contain more than a dozen such arrays. It is prohibitively expensive to connect all of the trace array outputs to the main dataflow of the microprocessor, as this would consume a large amount of wiring resources around traditionally already critical areas.
Some existing solutions multiplex the trace array outputs down to eight (8) bit wide data buses from sixty four (64) bit wide arrays. A trace read control block is then used to control reading the data from the trace arrays in eight (8) bit blocks and delivers this to firmware to store it away to memory, or possibly analyze it before storing it. Delivering the data to firmware is realized by connecting the eight (8) bit wide return data buses from the trace arrays to the trace read control, where it is provided in a register that can be read by firmware. This provides the path into the main dataflow of the microprocessor.
What are needed are techniques for efficient gathering of data from a set of trace arrays in a microprocessor.