The invention is generally related to digital signal processor (DSP) circuit arrangements and integrated circuits incorporating the same, and in particular to the interconnection of a digital signal processor with memories and other external devices.
As semiconductor fabrication technology advances, designers of integrated circuits are able to integrate more and more functions into a single integrated circuit device, or chip. As such, electronic designs that once required several integrated circuits electrically coupled to one another on a circuit board or module may now be integrated into a single integrated circuit, thereby increasing performance and reducing cost.
One function that has been migrated from discrete circuits to integrated circuits is digital signal processing, which is generally the application of mathematical operations to digitally represented signals. Digital signal processing is utilized in a number of applications, such as to implement filters for audio and/or video signals, to decode information from communications signals such as in wireless or other cellular networks, etc.
Semiconductor fabrication technology has advanced to the point where the logic circuitry that carries out digital signal processing may be carried out by dedicated digital signal processors that execute software programs, referred to herein as DSP programs, to implement specialized DSP algorithms. Moreover, digital signal processors may be embedded in integrated circuits, or chips, with additional logic circuitry to further provide improvements in performance while lowering costs.
Many digital signal processing tasks are characterized by a need to quickly perform repetitive, but relatively simple, mathematical calculations on a large amount of digital data. Multiply-Accumulate (MAC) operations, for example, perform multiplication of two operands and add the result to a running accumulator, and can often be implemented in hardware logic to be performed in a single clock cycle. Multiple MAC units may even be provided so that multiple MAC operations can occur within any given clock cycle. However, some complex filtering operations may require hundreds or thousands of MAC operations to be performed just to calculate one output value at a single point in time.
Given the repetitive nature of many DSP operations, the speed that input data can be retrieved from memory by a digital signal processor, as well as that output data can be written back into memory after being processed (often referred to as memory bandwidth), often has a significant impact on the overall performance of a DSP system.
One manner of increasing memory bandwidth is to utilize multiple communication paths, or buses, to communicate different types of data with a digital signal processor. For example, memory bandwidth can be effectively doubled by providing separate read and write paths with a memory, such that data can be written into a memory at the same time that other data is retrieved from the memory.
Memory bandwidth can also be increased by separating DSP program data and signal data into separate memory spaces, such that separately-accessible program and data memories are used to store DSP program instructions and signal data. Furthermore, digital signal data may be partitioned into multiple memory spaces so that multiple data points can be transferred to or from a given memory at a time. Many conventional DSP systems, for example, partition a data memory into separate X and Y memory spaces, so that, for example, pairs of operands for DSP operations such as MAC operations can be retrieved at the same time.
Often, performance is maximized when program and data memories are integrated onto the same integrated circuit as a digital signal processor, such that direct, high-speed links may be provided between the digital signal processor and the most frequently used information. Nonetheless, a digital signal processor, like just about any other logic circuit, typically needs to support some form of external communication, e.g., so that real world data obtained by other logic circuits can be retrieved and processed, and/or so that DSP results can be returned to other logic circuits for real world utilization. Such other logic circuits, which are generically referred to herein as external devices, may be located on the same integrated circuit, or may be located on other integrated circuits interfaced with the DSP. Typically, external communication with a digital signal processor is provided via a separate communication path, since such communications often are required to be transmitted at a slower rate than the maximum communication rate supported between the digital signal processor and its dedicated memories.
Despite the performance improvements enabled by the use of multiple communication paths to handle program, data and external device communication, multiple communication paths do have an associated cost in terms of connectivity. In particular, each communication path with a digital signal processor typically requires a relatively large number of electrical conduction paths. The more communication paths that are incorporated into a design therefore can significantly increase the overall number of conduction paths required in the design.
For example, an exemplary digital signal processor that supports separate X and Y data spaces with simultaneous bi-directional capability, and with each data space utilizing 16-bit addressing and 32-bit data, would require 192 data and address lines, along with a number of additional control signals. Assuming a separate program memory space of 20-bits with simultaneous bi-directional capability, and an external device address space of 16-bits with simultaneous bi-directional capability, and the total number of data and address lines required becomes 392.
When a digital signal processor is not integrated with any of the associated memories on the same integrated circuit, a large number of external interconnects, typically pins, are required. Integrated circuits, however, are often extremely limited in the amount of available interconnects, and the use of additional interconnects can increase manufacturing costs. Moreover, even when one or more of the associated memories are integrated onto the same integrated circuit device as the digital signal processor, placement and routing of logic circuitry and interconnects in a design can be complicated by the need for excessive interconnects, often increasing design costs, or requiring additional layers of circuitry to accommodate all interconnects, thereby increasing manufacturing costs as well. Additional interconnects may also adversely impact performance should the length of any interconnects be required to be increased, as increased interconnect length can increase propagation delay in an integrated circuit and thereby limit the permissible operational speed of the design.
Therefore, a significant need continues to exist in the art for a manner of better balancing system performance and memory bandwidth in a digital signal processor design with the interconnectivity requirements of the design.
The invention addresses these and other problems associated with the prior art by providing a circuit arrangement and method that reduce the number of interconnects required for a digital signal processor without significantly adversely impacting memory bandwidth by utilizing a shared bus to interconnect the digital signal processor to both a program memory and at least one external device. An instruction cache is utilized in association with the shared bus to cache selected instructions from a DSP program such that, whenever a cached copy of a DSP program instruction is available in the instruction cache, the cached copy can be fetched from the instruction cache instead of the program memory, thereby freeing the shared bus for performing an access to the external device.
Consistent with one aspect of the invention, a data bus interface is also provided to separately interface the digital signal processor with a data memory, such that the digital signal processor is capable of concurrently fetching an instruction from the instruction cache, communicating with the data memory over the data bus interface, and communicating with the external device over the shared bus.
Consistent with another aspect of the invention, the fetching of instructions from the instruction cache is performed responsive to detection of a loop during execution of a DSP program. A subset of instructions from the loop are cached in the instruction cache during a first pass through the loop, and in response to detection of the loop. Then, during a subsequent pass through the loop, instructions from the subset of instructions are fetched from the instruction cache instead of the program memory.
These and other advantages and features, which characterize the invention, are set forth in the claims annexed hereto and forming a further part hereof. However, for a better understanding of the invention, and of the advantages and objectives attained through its use, reference should be made to the Drawings, and to the accompanying descriptive matter, in which there is described exemplary embodiments of the invention.