1. Field of the Invention
The present invention is related to processing systems and processors, and more specifically to techniques for supporting stream prefetching directed by cache control logic.
2. Description of Related Art
Stream prefetching provides an efficient use of resources in processors and processing systems. When sequential access to two or more adjacent locations is detected, one or more additional cache lines can be prefetched from lower levels of a memory hierarchy in order to attempt to have data and/or instructions ready for use by the processor as they are needed. A “stream” is a contiguous set of cache lines containing instructions or data (or in some specialized processor architectures, instructions and data). The sequential fetching described above, is referred to a stream prefetching or stream prefetch.
Some existing stream prefetch schemes include a load-miss queue (LMQ) that tracks “load misses”, which are attempts to access a line that is not present in the particular level of cache memory associated with the LMQ. The LMQ values are filtered to detect adjacent cache lines and if any adjacent misses are detected, a stream table/stream queue is populated with an entry corresponding to the adjacent misses. The prefetch engine then prefetches at least one cache line ahead of the most recent cache line miss, in the apparent direction of the stream progress through the cache.
While such architectures are capable of detecting streams and directing prefetching of the streams, there are some inefficiencies involved, in particular with respect to out-of-order superscalar processors or symmetrical multi-threaded (SMT) processors, in which multiple load-store units (LSUs) may be present. The multiple LSUs compete with the prefetch engine for access to the LMQ, as LMQ entries must be updated on each cache miss and the prefetch engine needs the LMQ to maintain information about what lines are being prefetched. In SMT processors with multiple LSUs, the LSUs must compete with the prefetch engine, further decreasing efficiency. The LSUs typically insert a reject cycle for each missed fetch attempt until the LMQ is available, and insert a reject cycle for each prefetch request made by the prefetch engine. Further, the intermediate tables for stream filtering and the stream table itself require resources that consume power and occupy die area.
Therefore, it would be desirable to provide a stream detection and prefetch mechanism that does not require a stream table and other resources, and that removes conflicts between LSUs and the prefetch engine.