The invention is generally related to electronic circuit arrangements and integrated circuits incorporating the same, and in particular to address generation logic used to generate addresses for accessing a memory space.
As semiconductor fabrication technology advances, designers of integrated circuits are able to integrate more and more functions into a single integrated circuit device, or chip. As such, electronic designs that once required several integrated circuits electrically coupled to one another on a circuit board or module may now be integrated into a single integrated circuit, thereby increasing performance and reducing cost.
One function that has been migrated from discrete circuits to integrated circuits is digital signal processing, which is generally the application of mathematical operations to digitally represented signals. Digital signal processing is utilized in a number of applications, such as to implement filters for audio and/or video signals, to decode information from communications signals such as in wireless or other cellular networks, etc.
Semiconductor fabrication technology has advanced to the point where the logic circuitry that carries out digital signal processing may be carried out by dedicated digital signal processors that execute software programs, referred to herein as DSP programs, to implement specialized DSP algorithms. Moreover, digital signal processors may be embedded in integrated circuits, or chips, with additional logic circuitry to further provide improvements in performance while lowering costs.
Many digital signal processing tasks are characterized by a need to quickly perform repetitive, but relatively simple, mathematical calculations on a large amount of digital data. Multiply-Accumulate (MAC) operations, for example, perform multiplication of two operands and add the result to a running accumulator, and can often be implemented in hardware logic to be performed in a single clock cycle. Multiple MAC units may even be provided so that multiple MAC operations can occur within any given clock cycle. However, some complex filtering operations may require hundreds or thousands of MAC operations to be performed just to calculate one output value at a single point in time.
Given the repetitive nature of many DSP operations, the speed that input data can be retrieved from memory by a digital signal processor, as well as that output data can be written back into memory after being processed (often referred to as memory bandwidth), often has a significant impact on the overall performance of a DSP system.
One manner of increasing memory bandwidth is to utilize multiple communication paths, or buses, to communicate different types of data with a digital signal processor. As an example, a number of conventional DSP designs separate DSP program data and signal data into separate memory spaces, such that separately-accessible program and data memories are used to store DSP program instructions and signal data. Furthermore, digital signal data may be partitioned into multiple memory spaces (often referred to as xe2x80x9cXxe2x80x9d and xe2x80x9cYxe2x80x9d memory spaces) so that multiple data points can be transferred to or from a given memory at a time. Multiple ports, or access paths, into a memory may also be provided, such that multiple access operations can occur in parallel within a given memory.
As an example, a number of conventional DSP designs incorporate dual MAC units, and as such, require four paths into a data memory space (two each in the xe2x80x9cXxe2x80x9d and xe2x80x9cYxe2x80x9d memory spaces) to maintain maximum efficiency. To access four memory locations per cycle, therefore, four addresses must be generated and output to memory in each access cycle.
While generating four addresses typically does not present a significant problem from a circuitry standpoint, encoding four addresses within a processor instruction such as a DSP instruction presents a comparatively greater concern. DSP instructions, like most processor instructions, typically include an opcode field that specifies the type of instruction, and often the addressing mode to be used by the instruction, as well as one or more operand fields that identify either the data to be processed or where such data is located. Since the number of bits required to encode DSP instructions affects the number of instructions available in an instruction set, the Width of the interconnects, logic units, and registers that are required to process those instructions, and the size of the program memory space, it is highly desirable to minimize the number of bits required to encode addresses in any given DSP instruction.
In a number of conventional DSP designs that incorporate dual MAC units, for example, 32-bit instructions are used. For dual MAC operations, 14 bits of a MAC instruction are allocated to opcodes (7 bits for each MAC unit), leaving a total of 18 bits (9 bits for each MAC unit) to specify the locations of the four operands and where to store the results.
Typically, to minimize the number of required bits to encode addresses, a form of indirect addressing is used, where a bank of separate indirect address registers are preloaded with the desired addresses of operands, and where a MAC instruction specifies the locations of one or more indirect address registers from the bank from which to load the desired addresses. Also, it is often desirable to support address post-modification, where the addresses stored in indirect address registers are automatically modified (e.g., incremented or decremented by a fixed value) after the addresses are output from the registers.
Despite the use of indirect addressing, and in part due to the need to support enhancements such as post-modification, it is often not feasible to support the encoding of four independent addresses within a given DSP instruction. As a consequence, a technique known as address correlation is often used, where only two addresses are independently encoded and generated, with the remaining two addresses being generated by modifying the encoded addresses (e.g., by adding fixed offsets to the encoded addresses).
As an example, one of the aforementioned conventional DSP designs utilizes indirect-address MAC instructions having the following syntax:
MAC (ri)+postmod,(rj)+postmod,am∥MAC (rixcx9c),(rjxcx9c),an
where (ri) and (rj) are specify indirect addressing via selected indirect address registers ri and rj, postmod specifies the post-modification to apply to each stored address, am and an specify the accumulators to add the results to, and (rixcx9c) and (rjxcx9c) specify the correlated addresses.
The above MAC instruction is encoded in a 32-bit instruction as shown in table I below:
Other types of instructions may also use the aforementioned techniques to generate four addresses in a given cycle. A number of drawbacks, however, exist with respect to the use of such techniques.
First, the use of correlated addressing significantly limits the data organization inside the data memory space, since the data needs to be carefully organized to ensure that the data addressed via the correlated addresses is arranged in appropriate offsets from the data addressed via the encoded addresses. Often, hand optimization of program code is also required to minimize the number of processor cycles lost to inefficient data transfer.
Second, the aforementioned techniques typically only support either all reads or all writes to the independent and correlated addresses. Non-standard combinations such as 3 reads and 1 write, 3 writes and 1 read, etc., are typically not supported. As a consequence, if any such combinations are required, multiple instructions are typically required to process such combinations serially instead of in parallel. In addition to slower processing due to the need for additional instructions, valuable memory bandwidth is underutilized at times, thereby reducing the throughput of the processor below its maximum operating efficiency.
Therefore, a significant need continues to exist in the art for a manner of minimizing the number of bits required to generate and access multiple memory addresses with a processor instruction.
The invention addresses these and other problems associated with the prior art by providing a circuit arrangement and method that utilize xe2x80x9cschemexe2x80x9d registers to select among a plurality of indirect address registers from which to retrieve a stored memory address. As such, rather than identifying within an instruction the location of a particular indirect address register within which is stored an address to be used during processing of the instruction, an instruction may specify the location of a scheme register that identifies which of a plurality of available indirect address registers should be accessed to retrieve a stored address. The second level of indirection provided by scheme registers provides significantly greater flexibility in terms of generating multiple independent addresses with a minimal number of bits in an instruction.
For example, while the invention is not limited solely to digital signal processing applications, the invention does provide in such applications the ability to efficiently encode multiple independent addresses within a given DSP instruction. For a dual MAC application, as an example, four independent addresses may be generated by storing the four addresses in separate indirect address registers, and then identifying the four different indirect address registers within four different scheme registers, such that the four independent addresses may be retrieved by identifying the four scheme registers.
In addition, while a scheme register consistent with the invention may store nothing more than a location or identification of an indirect address register to be used to generate an address, in some embodiments additional information may be stored within a scheme register to provide enhanced functionality, and further maximize the efficiency of processor instructions. For example, it may be desirable to store post-modification information in a scheme register such that independent post-modification operations may be performed on each independently-generated address. Furthermore, it may be desirable to store access type information within a scheme register, e.g., specifying whether an operation is a read or write operation, such that separate types of accesses may be performed with each separate independently-generated address.
Therefore, consistent with one aspect of the invention, a circuit arrangement is provided, including a plurality of indirect address registers, each configured to store an address pointer, and a plurality of scheme registers, each configured to store an indirect address register selector that selects a selected indirect address register from the plurality of indirect address registers. The circuit arrangement further includes address generation logic that is configured to, in response to an instruction that selects a selected scheme register from the plurality of scheme registers, generate a memory address for use in accessing a memory from the address pointer stored in the selected indirect address register selected by the indirect address register selector stored in the selected scheme register.
Consistent with another aspect of the invention, a method of accessing a memory is provided. The method includes receiving an instruction that selects a selected scheme register from a plurality of scheme registers, accessing the selected scheme register to obtain an indirect address register selector that selects a selected indirect address register from a plurality of indirect address registers, accessing the selected indirect address register to obtain an address pointer stored therein, and accessing a memory using the address pointer.
These and other advantages and features, which characterize the invention, are set forth in the claims annexed hereto and forming a further part hereof. However, for a better understanding of the invention, and of the advantages and objectives attained through its use, reference should be made to the Drawings, and to the accompanying descriptive matter, in which there is described exemplary embodiments of the invention.