The present invention is directed, in general, to digital signal processors (DSPs) and, more specifically, to an efficient memory management mechanism for a DSP and a method of prefetching instructions for execution in a DSP.
Over the last several years, DSPs have become an important tool, particularly in the real-time modification of signal streams. They have found use in all manner of electronic devices and will continue to grow in power and popularity.
Those skilled in the art are familiar with DSP architecture in general. Conventional DSPs employ a pipeline through which pass data representing a signal to be processed. An execution core performs various mathematical and logical operations on the data to effect changes therein. Memory is coupled to the execution core. The memory contains not only instructions concerning the way in which the data are to be modified, but also further data that may be employed in conjunction with executing the instructions.
It becomes important at this point to discuss two details with respect to the way in which DSP memory may be architected. First, two fundamental DSP architectures exist that are distinguished from one another by how they interact with memory. So-called xe2x80x9cvon Neumannxe2x80x9d architecture DSPs unify instructions and data in a single memory and a single bus. So-called xe2x80x9cHarvardxe2x80x9d architecture DSPs split instructions and data between two separate memories and buses. The tradeoff is simplicity (von Neumann) versus speed (Harvard).
Second, more sophisticated DSPs stratify memory in an effort to balance speed, cost and power consumption. In a perfect and simple world, a DSP""s memory would be extremely fast, low power, arbitrarily large and on the same physical substrate. Unfortunately, very fast memory is very expensive and requires lots of power and arbitrarily large memory takes an arbitrarily large amount of room on a given substrate. Tempering those requirements with today""s commercial concerns regarding both chip and system cost, flexibility and power consumption, modern DSP architecture calls for memory to be stratified, perhaps into three or more layers.
Assuming for the moment that three layers are desired, those might be (1) an extremely small, fast cache, located on the same physical substrate as the processing core of the DSP, that contains very little, but highly relevant instructions or data, (2) a somewhat larger, somewhat slower memory, still located on the same physical substrate as the processing core of the DSP, that contains relevant instructions or data and (3) an external memory that is as large as need be to contain the entirety of a program and data that the DSP is to use, but that is located on a separate physical substrate and accessible only through a comparatively slow external memory interface. While keeping the external memory on a separate substrate increases flexibility in system design and allows the DSP""s chip size to remain small, external memory requires its own power. Therefore, every external memory access comes at the cost of some power consumption that should be minimized in power-consumption-sensitive (typically battery-powered) systems. It should also be noted that processors of all types, including ubiquitous microprocessors, employ the same stratification strategy to balance their speed and cost goals.
Given this memory stratification, designers have set about for years to increase performance by developing a number of schemes to avoid latencies and power consumption associated with gaining access to more distant echelons of memory for purposes of loading instructions or loading and storing data. Intelligent guesses concerning instructions and data that may be useful in the near future can be employed to great advantage to retrieve ahead of time (or xe2x80x9cprefetchxe2x80x9d) such instructions or data into faster memory. As effective, as prefetching is, more can be done to reduce the bottlenecks that exist between a digital signal processor and its off-chip external memory.
Accordingly, what is needed in the art is a better way to manage stratified memory to increase processor performance. More specifically, what is needed is a mechanism to improve overall DSP performance.
To address the above-discussed deficiencies of the prior art, the present invention provides, for use in a processor having an instruction cache, an instruction memory and an external memory, a memory management mechanism, a method of managing memory and a digital signal processor incorporating the mechanism or the method. In one embodiment, the mechanism includes: (1) an external memory request abort circuit, coupled to the external memory interface, that aborts a request to load an instruction from the external memory before the information is loaded into the instruction cache and (2) an instruction cache invalidator, associated with the external memory request abort circuit, that invalidates the instruction cache when address spaces of the instruction memory and the external memory overlap and the processor switches between the instruction memory and the external memory.
The present invention therefore introduces a mechanism that reduces the bottlenecks existing in conventional processors that employ external memory. As previously described, those bottlenecks are caused by limited external memory speed and external memory bus bandwidth. The present invention addresses these limitations by avoiding unnecessary loads and by easing the switch between internal instruction memory and external memory. More specifically, the present invention aborts unnecessary loads when advantageous to do so, and employs a hardware scheme to invalidate the instruction cache when necessary to do so. Using hardware, rather than a software routine, to invalidate the instruction cache frees the DSP to perform other tasks concurrently with the invalidation and thereby improves the overall performance of the DSP.
In one embodiment of the present invention, the external memory is synchronous memory. Those skilled in the art will understand, however, that other forms of external memory may benefit from application of the present invention.
In one embodiment of the present invention, the mechanism further includes an instruction prefetch mechanism that prefetches instructions from a selected one of the instruction memory and the external memory into the instruction cache. As described above, prefetching can be employed to avoid latencies normally associated with loads from slower memory. The present invention can advantageously be used with prefetching, although this need not be the case.
In one embodiment of the present invention, the instruction cache is direct mapped. A direct mapped instruction cache offers certain architectural advantages, primarily simplicity. Of course, other cache architectures are within the broad scope of the present invention.
In one embodiment of the present invention, the external memory request abort circuit is associated with a request arbiter in the processor. The request arbiter arbitrates requests from a data unit and an instruction unit of the processor. Of course, this need not be the case.
In one embodiment of the present invention, the instruction cache invalidator comprises a programmable control register. Alternatively, the cache invalidator may assert a signal directly into the instruction cache to flush it. All hardware means of invalidating the instruction cache are within the broad scope of the present invention.
In one embodiment of the present invention, the processor is a digital signal processor. The teachings and principles of the present invention may, however, be applied to processors in general, including microprocessors.
The foregoing has outlined, rather broadly, preferred and alternative features of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiment as a basis for designing or modifying other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.