In order to reduce penalties in system performance due to accesses to and from relatively slow system memory, modern data processing systems employ memory caches constructed from high speed memory cells as an intermediate memory store between a central processing unit (CPU), and system memory. The cache memory may be internal to the CPU, a so-called level one (L1) cache, or the cache memory external to the CPU, a so-called level two (L2). Data and instructions are loaded from system memory into cache, and then fetched from cache by the CPU.
A typical cache organization incorporates a fixed alignment of data, which may, in a mixed cache, represent data or instructions on a fixed alignment boundary. In such a cache organization, the first data value in the cache has an address that is a multiple of a predetermined value. For example, a cache that is double-word aligned, in a data processing system having a word length of four bytes, has a first data value with an address that is a multiple of thirty-two. In other words, the first data value may have a relative address of 0, 32, 64, 96, etc. The subsequent bytes have relative addresses increasing consecutively, up to a width of the cache.
In order to effect an unaligned read, that is, a read in which the first data value does not correspond to the cache boundary, an entire cache line is accessed. Thus, for example, in a cache having a width of 64 bytes, an unaligned double-word read of eight consecutive bytes necessitates reading the entire 64-byte cache line. The 64-byte cache line is read out, and formatting circuitry, outside of the cache then extracts the double-word byte sequence required. The unused 56 bytes, in the present example, are effectively thrown away.
Such unaligned cache reads in the prior art are costly in terms of clock cycles and power consumption. Power is consumed in the cache read, and therefore, power is wasted in reading the entire cache line to perform an unaligned read. Additionally, power is consumed in the formatting circuitry. The use of clock cycles to extract the desired data in the formatter increases the latency associated with an unaligned read. Thus, there is a need in the art for methods and apparatus which increase the speed of an unaligned read, and additionally alleviates the wasted consumption of power associated with an unaligned read according to the prior art.