The present invention relates to digital information processors, and more particularly, to methods and apparatus for use in digital information processors that support digital memory buffers.
Many digital information processors provide digital memory buffers to temporarily store information. A digital memory buffer may be constructed of dedicated hardware registers wired together or it may simply be a dedicated section of a larger memory.
One type of digital memory buffer is referred to as a circular buffer. In a circular buffer, the first location of the buffer is treated as if it follows the last location of the buffer. That is, when accessing consecutive locations in the buffer, the first location automatically follows the last location.
It is desirable to quickly access the information that is stored in a circular buffer. For example, a digital information processor may have an execution pipeline to enhance throughput, yet information must be accessed quickly in order to take full advantage of the pipeline. Consequently, memory addresses (which are used to access the locations of the buffer) are often generated using a hardware-implemented address generator. One type of hardware-implemented address generator for a circular buffer maintains four registers for each circular buffer: (1) a base register, B, containing the lowest numbered address in the buffer, (2) an index register, I, containing the next address to be accessed in the buffer, (3) a modify register, M, containing the increment (or the decrement) value, and (4) a length register, L, containing the length of the buffer.
FIG. 1 shows an example of a circular buffer, incorporated as a part of a larger memory, and address registers that may be maintained in association with the memory buffer. The lowest numbered address in the buffer, i.e., address 19, is referred to as the base address. The base address is stored in a base register, B. The highest address in the buffer, i.e., address 29, is referred to as the end address and is indicated as E. The length of the buffer is stored in a length register, L. An index register, indicated at I, is a pointer into the buffer. The index register typically contains the address of the next location to be accessed, e.g., address 26. After each access, the pointer is incremented or decremented a predetermined number of addresses so as to be prepared for the next access into the circular buffer. The number of address spaces which the pointer is incremented or decremented is the modify amount and is stored in a modify register, M. It is common for the modify amount to be a fixed number which does not change, although there are applications in which the modify amount may be varied.
Many digital information processing routines make use of memory buffers. One such routine is commonly referred to as a Fast Fourier Transform (FFT). FFT routines use a series of xe2x80x9cbutterflyxe2x80x9d computations to generate a result. The results from one butterfly computation are used as the input data for the next butterfly computation.
Most FFT routines are written such that the input data for each butterfly computation is read from a particular memory buffer (referred to herein as an input buffer) and the results from each butterfly computation are stored in another memory buffer (referred to herein as an output buffer). Since the results of each butterfly computation are used as the input data for the next butterfly computation, the results must be xe2x80x9cloadedxe2x80x9d into the input buffer before the next butterfly computation can begin.
There are various ways that one could go about xe2x80x9cloadingxe2x80x9d the results into the input buffer. One way is to simply copy the results from the output buffer to the input buffer. However, copying the results from one buffer to another may require a significant amount of time, relatively speaking, which adds significant overhead and thereby reduces the performance of the FFT routine.
Another way is to redirect the address registers associated with the input buffer so as to point to the addresses in the output buffer where the results from the previous butterfly computation are stored. In conjunction, the registers associated with the output buffer are typically redirected so as to point to the addresses previously used for the input buffer. This is done so that the results of a given butterfly computation can be stored without overwriting the input data for that butterfly computation. The overall effect of redirecting the address registers associated with the input and output buffers is the same as if the contents of the input buffer had been swapped with the contents of the output buffer.
The redirecting of the address registers is commonly carried out as follows: (1) the contents of the base register for the input buffer is swapped with the contents of the base register for the output buffer, and (2) the contents of the index register for the input buffer is swapped with the contents of the index register for the output buffer.
FIG. 2A is a representation of the contents of the base and index registers for the input and output buffers before the contents are swapped. Before the contents are swapped, the base register, B0, and the index register, I0, which in this example are associated with the input buffer, point to the input data used for butterfly computation #1. The base register, B1, and the index register, I1, which are associated with the output buffer, point to the results from butterfly computation #1.
FIG. 2B is a representation of the contents of the base and index registers for the input and output buffers after the contents are swapped. After the contents of the registers are swapped, the base register, B0, and the index register, I0, associated with the input buffer, point to the results for butterfly computation #1. The base register, B1, and the index register, I1, associated with the output buffer, point to the input data used for butterfly computation #1.
FIG. 3 shows a routine that is commonly used to swap the contents of the index and base registers of the input buffer with the contents of the index and base registers of the output buffer. This routine includes six instructions and uses temporary registers R0, R1.
Notwithstanding the performance level of current digital information processors, further improvements are needed.
According to one aspect of the present invention, a method is used in a digital information processor having a first address register for storing a first address and having a second address register for storing a second address. The method includes responding to a swap instruction, which specifies a swap operation for at least two address registers that are identified explicitly or implicitly, by swapping the contents of the first address register and the second address register.
According to another aspect of the present invention, a digital information processor includes a first address register for storing a first address, a second address register for storing a second address, and a circuit that receives a swap instruction, which specifies a swap operation for at least two address registers that are identified explicitly or implicitly, and responds to the swap instruction by swapping the contents of the first address register with the contents of the second address register.
According to another aspect of the present invention, a digital information processor includes a first address register for storing a first address, a second address register for storing a second address, and means, responsive to a swap instruction, which specifies a swap operation for at least two address registers that are identified explicitly or implicitly, for swapping the contents of the first address register and the second address register.
According to another aspect of the-present invention, a data address generator (DAG) includes a first address register containing a first address corresponding to a location in a first circular buffer, a second address register containing a second address corresponding to a location in a second circular buffer, and a circuit that receives a signal that indicates a swap instruction and responds to the signal by swapping the contents of the first address register and the second address register.
Depending on the implementation, a swap instruction may completely eliminate the need for temporary registers to carry out the swap, which in turn reduces the register pressure and helps to reduce the possibility of delays due to excessive register demand (delays can reduce the execution speed and level of performance of the routine running on. the processor). Again, depending on the implementation, the swap instruction may reduce or completely eliminate data dependencies like those in the routine of FIG. 3 and any associated wait cycles (data dependencies and wait cycles can reduce the execution speed and level of performance of a routine running on the processor).
According to another aspect of the present invention, a method for use in a digital information processor includes swapping the contents of a first address register and a second address register in a future file in response to a swap instruction, generating and sending one or more control signals from the future file to the architecture file in response to a swap instruction, and swapping the contents of the first address register and the second address register in an architecture file in response to the one or more control signals.
It has been recognized that the latter mentioned aspect of the present invention is not limited to swap instructions, but rather may be applied to pipelined data processors in general, particularly in a situation where the results of an operation are needed at more than one stage in the pipeline. For example, rather than performing an operation at one stage and pipelining the results to subsequent stage(s), the capability to actually carry out the operation may be provided at more than one stage in the pipeline. Thereafter, only control signals (and not the results) need be provided to subsequent stage(s). Depending on the embodiment, this may lead to a reduction in the required area and/or power.
Notwithstanding the potential advantages, discussed above, of one or more embodiments of one or more aspects of the present invention, it should be understood that there is no absolute requirement that any embodiment of any aspect of the present invention address the shortcomings of the prior art.