The invention relates generally to systems and methods for allocating memory space on integrated circuits and more particularly to systems and methods for allocating memory accessible by multiple digital signal processing devices.
Integrated circuits employing digital signal processors (DSPs), such as digital audio processing devices and other processing devices, may have a limited fixed amount of memory integrated onto a common integrated circuit or located on a shared circuit board containing the digital signal processor(s). The memory for the digital signal processor(s) may be relatively expensive compared to other circuits and components in the system. An improper use of memory can result in system inefficiencies and weak system performance. Hence the trade-off between cost and performance is a constant challenge for computer system designers.
DSP cores are optimized to provide high performance arithmetic functions. However, the DSP core performance is only part of the solution of an optimized DSP architecture. Most DSP algorithms require a large amount of data to be moved between the DSP and its memories in order to keep up with the execution of the DSP and storing of the results. Therefore, the organization of the memories and its architecture is as vital to the overall DSP performance as the DSP core. From an economic view, DSP software applications will require more and more memories, and the ratio of area between the DSP and its memories will widen in the future. Therefore, optimized memory usage for both data and program should be a major consideration in the DSP and memory architecture.
One type of known architecture that attempts to strike a suitable balance between cost and performance includes a digital signal processor with one memory having different mapped locations. A single address bus and data bus is used to communicate information between the DSP and the memory. With such systems however, processing of audio information using digital filters that require many multiply and accumulate calculations, typically operate too slowly since the digital signal processor is unable to do simultaneous calculations. Hence such systems are not typically flexible enough for many applications.
To overcome associated problems with such designs another known design uses a multiport DSP architecture and allocates separate buses linked with separate types of memory spaces. For example such a system may use a dedicated port for accessing dedicated memory space for storing the program code, a dedicated port for accessing dedicated memory space for storing left data and another dedicated port for accessing dedicated memory space for storing right data. As such, the digital signal processor typically has a program port, a left data port and a right data port to facilitate multi-operand multiplication in one clock cycle. When the DSP is used for different software applications, such as when digital audio is in MPEG or a AC-3 format, the new application must be partitioned properly and loaded into the appropriate DSP memory block for processing each type of audio format. Although such systems may be more flexible and deliver superior performance than single port systems, they can be inefficient in memory usage when supporting different types and sizes of applications since the memory blocks are fixed in size and dedicated for usage for a given port.
For example, where digital audio streams are being processed in real time, the memory may be allocated solely for the processing of the real time information thereby negating use of the memory for any other background application. Also, not all of the memory block may be fully utilized where the application requires very little memory, and any unutilized memory is not accessible by any other ports or even other DSPs. This problem may become a serious issue when there are multiple digital signal processors each requiring dedicated but limited size blocks of memories for its own ports.
In the most primitive single block memory architecture, the operation of the FIR will require four cycles to complete. The Harvard architecture refers to a memory architecture where the processor can access two independent memory blocks (one for program and the other for data) via two independent sets of buses. Two memory accesses can be performed in a single cycle and thus the cycle requirement is reduced by 50%. In the Modified Harvard architecture, one memory block is for both program and data, while the other is for data only. Many DSP cores have taken the Modified Harvard architecture concept a bit further by providing another data memory block in addition to the program/data memory block and the data memory block Therefore, three independent memory accesses can be performed in a single cycle which allows the fetching of the instruction, the fetching of the data, and the fetching of the coefficient be all done in a single cycle. Then storing the result of the execution is performed in the next cycle. One disadvantage of the Modified Harvard architecture is the segmentation of the memory blocks. Different DSP applications have different program and data memory requirements and therefore such architectures can make inefficient use of memory.
Accordingly, it would be desirable to have a memory allocation system and method that facilitates improved use of available memory to accommodate multiple sized software applications.