Embodiments of the present invention relate to operation of a processor, and more particularly to obtaining data for use in a processor.
When data needed for an operation is not present in the processor, a latency, which is the time it takes to load the data into the processor, occurs. Such a latency may be low or high, depending on where the data is obtained from within various levels of a memory hierarchy. In some systems, prefetching schemes are used to generate and transmit prefetch requests corresponding to data or instructions that are predicted to be needed by a processor in the near future. When the prediction is correct and data is readily available to an execution unit, latencies are reduced and increased performance is achieved. However, prefetching can require significant computational resources, and can consume chip area and power consumption. Furthermore, prefetch requests, in addition to actual memory requests, still suffer from latencies.
That is, in addition to a memory latency incurred in requesting data from a remote location (e.g., memory, mass storage or the like), in many systems a processor socket may have its own latency associated with generating and sending transactions outside the processor socket. These delays, which may be applicable to all transactions, can be associated with delays of serialization/de-serialization, related protocol processing, and so forth. For example, in systems implementing a serial point-to-point (PTP) distributed interconnect system, latencies can occur in transaction processing through protocol and link layers in each of multiple agents through which a transaction passes. Such delays can incur a significant amount of cycles before a request is even sent out of the processor socket, and can negatively impact performance.
As more systems are architected with a distributed memory system and serial interconnects, latencies of such serial-based protocols can affect performance. Thus although improvements in memory latency generally translate into performance improvements across all applications, improvements in memory bandwidth typically only benefit certain classes of applications. As an example, depending on a given type of application, serial interconnects from a processor socket to other system components, such as chipsets, memory and so forth, may use limited bandwidth. However, such applications may still suffer from increased latencies due to serial-based interconnects.