1. Field of the Invention
This invention relates in general to the field of data retrieval in computers, and more specifically to an improved method and apparatus for loading aligned and misaligned data from a cache within a microprocessor.
2. Description of the Related Art
Within a processing system, a common operation that is performed during execution of a software program is the retrieval of data from memory. To overcome timing bottlenecks associated with retrieval of the data from memory, whose access time is comparably slow with respect to microprocessor speeds, typical microprocessors retain a copy of frequently accessed data in a cache. A cache is a memory structure fabricated to provide data to a processor much faster than conventional memory. Thus, when a data load instruction references a memory address of a data entity that is also present in the cache, the data is retrieved from the cache rather than from memory.
Although access to data contained in a cache is indeed much faster than memory, microprocessors do not access a specific memory address within a cache. Rather, data is retrieved from the cache in defined subdivisions of the cache, called cache sub-lines. If the data is stored entirely within a single cache sub-line (i.e., the data is aligned), then it may be retrieved in an single load operation. If, however, a first part of the data is located in a first cache sub-line and a remaining part of the data is located in a second cache sub-line (i.e., the data is misaligned), then two sequential loads must be executed to retrieve the data.
Thus, whether or not a microprocessor retrieves data from a cache in one or two load operations depends on whether the data is in the cache, and whether the data is aligned or misaligned. Therefore, before a data load from a cache is initiated, a calculation must first be made to determine whether the desired data is aligned or misaligned, that is, whether one or two load operations will be required to load the desired data.
To determine whether data is aligned, the memory address of the data and its associated length must be known. Within an x86-compatible microprocessor, calculation of a memory address often requires that a 3-way addition be performed, the 3-way add summing a base, a displacement, and a segment base, for example. This 3-way addition is time consuming, and therefore cannot be completed in sufficient time to allow a second load instruction to be generated, if the data is misaligned. Hence, microprocessors typically insert a mandatory "slip" in a load operation to allow the alignment determination to complete, before allowing the load operation to continue. If the data requested by a load is aligned, then two cycles are needed: one for the mandatory slip and one for the load. If the data is misaligned, then three cycles are needed: one for the mandatory slip, and one for each of the two partial load operations. In addition, the slip is required for a "tickle" to insure that the second access is not going to cause a protection fault or a page fault. More specifically, the first half of the access cannot be permitted to finish before it is known whether the second access will create a page or protection fault.
Since load operations are ubiquitous in software programs, the time delays incurred in executing the load operations, as described above, can extensively affect the time required to execute the program.
Therefore, what is needed is an apparatus and method that allows a load of a data entity from cache to be executed faster than has heretofore been provided. In addition, what is needed is a microprocessor that executes a load of a data entity from cache without requiring insertion of a mandatory pipeline slip.