1. Field of the Invention
The invention relates generally to logic circuits, and more particularly relates to a design and method for implementing shift-based access control for a sequential data buffer in which data in the buffer is accessed by selectively incrementing an access window. In an exemplary embodiment, the invention is used in the design of a prefetch queue to permit accessing misaligned data (i.e., variable length instructions) for transfer to an instruction decoder.
2. Description of Related Art
Sequential data buffers are accessed by specifying an access index to an initial byte and an access size (or mask). For each new access, the index is incremented by a selected positive offset thereby incrementing a window of access-size bytes sequentially through the buffer.
Without limiting the scope of the invention, this background information is provided in the context of a specific problem to which the invention has application: improving the design of an instruction prefetch queue that stores variable length instructions to permit faster byte transfers to the instruction decoder.
Instruction prefetch is commonly used in microprocessors to optimize computer execution time. Prefetched instructions are input to a decoder for decoding, and then prodded to the CPU (central processing unit).
Prefetch logic includes a prefetch queue, which typically is a data buffer, such as a sequential or circular buffer, that holds a predetermined number of prefetched instruction bytes. For 32-bit microprocessor architectures (such as the 386) in which internal buses are 32 bits or four bytes (one double word) wide, typical sizes for prefetch queues are 16 bytes corresponding to four 4 byte double words (or in the case of prefetch from a cache, four 4 byte cache lines).
Instructions are typically of variable length--for the 386 and 486 architectures, instructions can be from 1 to 15 bytes in length. Thus, instructions in the prefetch queue will be misaligned. That is, for a 32-bit architecture in which the prefetch queue transfers 4 bytes at a time to the instruction decoder, except for instructions that are multiples of four bytes, the decoder 11 discard 1 to 3 bytes each instruction.
Typically, the decoder does not store the bytes discarded, even though those bytes represent the initial bytes of the next instruction. Thus, for each 4 byte transfer from the instruction prefetch queue, the decoder must advise the buffer control logic (BCL) of the number of bytes used (from 0 to 4 bytes). In response, the BCL will increment the index (initial byte) of 4 byte transfer window such that the first byte of the next 4 byte transfer to the decoder corresponds to the byte after the last one used by the decoder from the previous transfer.
The 4 byte transfer window is incremented through the prefetch queue by (a) encoding a bytes-used value, (b) adding this bytes-used code to the previous index code to increment the index, and then (c) decoding the new index code to increment the transfer window to the new index. That is, the instruction decoder provides a bytes-used code (typically, three bits) to the BCL, which adds the bytes-used code to the index code for the previous index, and then decodes the index code to determine the new index. The BCL increments to the new index, and correspondingly moves the transfer window in preparation for the next 4 byte transfer to the decoder.
Using the repositioned transfer window, the corresponding 4 bytes in the prefetch queue are selected for transfer to the instruction decoder. Three techniques are commonly employed: (a) single port--reading two 4- byte lines in sequence, and selecting the appropriate 4 transfer bytes; (b) two port--reading two lines in parallel and selecting the appropriate 4 transfer bytes; and (c) byte selection--the prefetch queue includes a read line for each byte, enabling the direct selection of the appropriate 4 transfer bytes from the prefetch queue.
The current approach to indexing the transfer window--encoding a bytes-used increment code, adding/incrementing the index code for the previous index, and decoding the new index code to increment the transfer window--is disadvantageous in terms of the time required to complete the transfer operation. As a result, the instruction decode operation, including transfers from the prefetch queue, may create a bottleneck that limits operational frequency.
Accordingly, a specific need exists for an improved design for the buffer control logic for a prefetch queue to enable faster byte transfers to the instruction decoder.