The field of the present invention relates generally to instruction processing, and more particularly to a system for processing instructions using macros.
The trend toward low cost processors is often a driving force in the design of many embedded processor systems. Typically, custom application specific integrated circuit (ASIC) devices are built around a central processor core along with interface units and on-chip memory to form a complete system. Such systems often have an external memory interface which is used to access a memory space much larger than that which can be placed in embedded RAM on-chip. Due to the need for higher performance and increased software functionality, designers must rely on fast off-chip RAMs, which can meet the low latency demands of high speed processors. Even with fast RAMs, the off-speed access requirements often constitute a bottleneck that compromise the overall processor speed, which is often capable of speeds above 100 MHz or more, if it was not for the slow external memory interface.
Moreover, in small environments where board real-estate is at a premium (such as mini-PCl and Cardbus form factors), off-chip memories take up valuable area. Since RAMs require parallel address and data lines, as well as various control signals, they often pose a further challenge to board routing, which adds to the overall system size. In portable environments, off-chip RAMs which are clocked at high speeds often consume a great deal of power. Also, the cost of off-chip RAM is expensive, often meeting or exceeding the cost of the ASIC processor. Cost sensitivity in the consumer market is yet another reason making off-chip memories less attractive. However, increasing functionality results in larger and larger software programs requiring more and more memory to hold instructions as well as data.
Embedded processor systems therefore often strive to eliminate the amount of off-chip memory required. This reduces overall cost and valuable board area. Also any reduction in the amount off-chip traffic will allow for improved throughput, as well as reduced power consumption.
Commonly, typical embedded processor systems rely on instruction and data caches to reduce the amount of data traffic. Caches are typically on-chip RAMs which contain frequently accessed instructions or data. These are quite effective if designed properly. However, instruction and data cache design do add to overall system complexity due to the need to support algorithms used for cache replacement and cache coherency.
Cache designs also are not well suited for context switching applications. Cache designs depend on locality of reference for good performance. Locality of reference refers to the property that future instructions in the code stream come from a location near the current instruction fetch. Therefore, there is a higher probability of cache hit (i.e., having the next instruction fetch already in the cache line). This is normally the case with conventional code streams, since execution order is largely sequential in nature and hence the cache can react effectively to the deterministic behavior.
However, code that rapidly context switches reacts in a random non-deterministic way. A context switch may involve code fetch from a completely different address which is nowhere near related to the current instruction fetch. Often times when two or more processes cannot fit in the cache, thrashing may result. Thrashing is overhead caused by replacing and restoring cached data to the main off-chip memory in a rapid fashion. Therefore, the processor will waste many CPU cycles just to manage the cache and thus not be performing useful work. Caching in that regard can actually hurt performance due to the extreme overhead seen with fruitless cache updates.
Furthermore, the caching of high priority instructions may result in lower performance. This is especially true in real time operating system environments. A cache miss penalty at an inopportune time can slow down the performance of execution where timing is crucial in order to achieve a real time response. In addition, there is the cache overhead of flushing and reading/writing cached data to keep it consistent and coherent with the memory system. This impacts overall system performance since cache updates waste network or bus bandwidth. Distributed systems also have issues with respect to cache coherency and broadcasting of cache data updates. Embedded processor systems also have cost issues which may prohibit the use of very large caches and associated hardware.
Other approaches use other types of on-chip memories which are not associated with caching. Some systems use embedded scratch pads to store commonly accessed instructions in a buffer, rather than going off chip for this data. These buffers are not managed by the cache hardware, but rather the compiler or software governs the memory allocation. These scratch pads are useful but limited in effectiveness. In this regard, code must be specially written to utilize the scratch pad, and hence is not portable like cache methods. The scratch pad method is less effective for instruction storage than for data storage, since code must be hard compiled into those locations and there is a limit to the amount of code that can be stored on-chip.
The present invention addresses these and other drawbacks of the prior art.
According to the present invention there is provided a method for processing instructions in a data processing system including a first memory, a first program counter, an instruction decoder, and an execution unit, the method comprising the steps of: retrieving instructions from the first memory using the first program counter; decoding the instructions retrieved from the first memory using the instruction decoder; generating a macro execution signal in response to decoding of a MACRO CALL instruction; executing of one or more subsequent instructions stored in a second memory using the execution unit and a second program counter, in response to generation of the macro execution signal, wherein said MACRO CALL instruction identifies a macro starting address in the second memory and an offset value indicative a macro ending address; and completing execution of instructions stored in the second memory when the second program counter reaches the macro ending address.
According to another aspect of the present invention there is provided a data processing system comprising: a first memory for storing instructions; a first program counter for addressing the first memory; an instruction decoder for decoding instructions, wherein said instruction decoder generates a macro execution signal in response to decoding of a MACRO CALL instruction; an execution unit for executing instructions; a second memory for storing instructions; a second program counter for addressing the second memory; and a control unit for controlling the retrieval of instructions from the second memory, wherein said execution unit executes one or more instructions stored in the second memory, in response to generation of the macro execution signal, wherein said MACRO CALL instruction identifies a macro starting address in the second memory and an offset value indicative a macro ending address, wherein execution of instructions stored in the second memory is completed when the second program counter reaches the macro ending address.
According to still another aspect of the present invention there is provided a data processing system comprising: first means for storing instructions; first means for addressing the first means for storing instructions; means for decoding instructions, wherein said means for decoding instructions generates a macro execution signal in response to decoding of a MACRO CALL instruction; means for executing instructions; second means for storing instructions; second means for addressing the second means for storing; and control means for controlling fetching of instructions stored in the second means for storing, wherein said means for executing instructions executes of one or more instructions stored in the second means for storing, in response to generation of the macro execution signal, wherein said MACRO CALL instruction identifies a macro starting address in the second means for storing and an offset value indicative a macro ending address, wherein execution of instructions stored in the second means for storing is completed when the second means for addressing reaches the macro ending address.
According to yet another aspect of the present invention there is provided a method of operating a data processing system to execute one or more instructions stored in an on-chip memory device located on-chip with data processing components and an off-chip memory device located remote therefrom, comprising: fetching a MACRO CALL instruction from the off-chip memory device; decoding the MACRO CALL instruction to identify a macro stored in the on-chip memory device; executing the MACRO CALL instruction, wherein execution of the MACRO CALL instruction results in the generation of a macro execution signal, wherein said macro execution signal causes retrieval and execution of one or more instructions from the on-chip memory, and ending upon detection of the end of the macro.
An advantage of the present invention is the provision of an instruction processing system, which allows more frequently executed instructions to be readily available.
Another advantage of the present invention is the provision of an instruction processing system, which allows for a reduction in off-chip instruction fetching.
Another advantage of the present invention is the provision of an instruction processing system, which allows for a deterministic instruction execution speed.
Another advantage of the present invention is the provision of an instruction processing system, which reduces the overall memory space needed to hold a program.
Still another advantage of the present invention is the provision of a an instruction processing system, which reduces power consumption.
Yet another advantage of the present invention is the provision of an instruction processing system, which allows for faster instruction execution speed.
Still other advantages of the invention will become apparent to those skilled in the art upon a reading and understanding of the following detailed description, accompanying drawings and appended claims.