The present application relates generally to an improved data processing apparatus and method and more specifically to an apparatus and method for loading data to vector renamed registers from across multiple cache lines.
A microprocessor is the heart of a modern computer, a chip made up of millions of transistors and other elements organized into specific functional operating units, including arithmetic units, cache memory and memory management, predictive logic, and data movement. Processors in modern computers have grown tremendously in performance, capabilities, and complexity over the past decade.
A memory cache is a memory bank that bridges main memory and the central processing unit (CPU). A cache is faster than main memory and allows instructions to be executed and data to be read and written at higher speed. Instructions and data are transferred from main memory to the cache in fixed blocks, known as cache “lines.”
Caches take advantage of “temporal locality,” which means the same data item is often reused many times. Caches also benefit from “spatial locality,” wherein the next instruction to be executed or the next data item to be processed is likely to be the next in line. The more often the same data item is processed or the more sequential the instructions or data, the greater the chance for a “cache hit.” If the next item is not in the cache, a “cache miss” occurs, and the CPU has to go to main memory or a higher cache level to retrieve it. A level 1 (L1) cache is a memory bank typically built into the CPU chip. A level 2 cache (L2) is a secondary staging area that feeds the L1 cache. Increasing the size of the L2 cache may speed up some applications but have no effect on others. L2 may be built into the CPU chip or may reside on a separate chip or a separate bank of chips.