1. Field of the Invention
This invention relates to the field of microprocessor architectures. More particularly, the invention relates to reducing overhead associated with register set saving and restoring as is required when invoking functions, exception handlers, and interrupt service routines. The invention further relates to register shadowing and windowing strategies to reduce function calling and task switching times in multi-issue processors, especially superscalar RISC processors and very long instruction word (VLIW) digital signal processors (DSPs).
2. Description of the Prior Art
Studies show that register saving and restoring in response to function calls and returns accounts for between 5% and 40% of the data memory traffic in executing programs written in a high level programming language. Also, the registers must be saved whenever a program switches tasks. In a UNIX operating system, for example, this accounts for approximately 20% of the task switching overhead. In more streamlined real-time operating systems as are common with embedded processors and DSPs, register saving and restoring accounts for a much higher percentage of the task switching time. Even interrupt service routines that do not require a full task switch still require at least some of the registers to be saved and restored. This adds significant overhead in many cases.
Register shadowing and windowing techniques have been introduced in an effort to reduce delays associated with register set storing and loading. Prior art processor architectures that incorporate shadow registers or register windows are discussed in detail in John L. Hennessy and David A. Patterson, "Computer Architecture: A Quantitative Approach," Morgan Kaufmann Publishers, Inc., San Francisco, Calif., 1991. These concepts are by now employed on some high performance RISC processors and DSPs. For example, the Analog Devices ADSP21xx series of DSPs use a shadow register bank. Sun Microsystem's SPARC processors use windowed register banks. These register systems allow the processors to switch register sets in a single cycle.
Register shadowing is a technique whereby a primary register set is shadowed by a mirror image register set. When a register set switch command is issued, the machine context can be switched from the primary register set to the shadow register set. Shadow register sets are useful for fast switching between tasks or between a primary program and an interrupting program. For example, in a DSP, a supervisory task may run in the background while the main signal processing algorithm runs in the foreground. This technique can be supported for example, in the Analog Devices ADSP21xx series of DSPs. In the ADSP21xx processors, there is only one shadow register set. Hence, single-cycle context switching can only occur between one primary task and one secondary task. Also, in the case of the ADSP21xx, the address registers are not saved upon a shadow register switch. Hence in applications, a long sequence of commands is required to save and restore the address registers, requiring a significant time penalty.
As can be seen from the foregoing discussion, a problem with shadow register systems is their inability to provide single-cycle context switching to more than one task. In theory, if N shadow register sets are added, then single-cycle task switching between N+1 tasks is possible. The problem, however is that if more than N+1 tasks need to be supported, the single-cycle task switching will only be possible between a subset of the total number of tasks. Also, a significant amount of silicon area is needed for each added shadow register set. Finally, the software that manages the tasks becomes difficult and less efficient because it has to manage a first type of task that has its context stored in a shadow register set, and a second type of task that has its context stored in memory. Whenever a task of the second type is invoked, context switch oriented register save and restore operations are required. For these reasons, shadow register sets have not gained widespread popularity.
Shadow register sets can optionally be used for register save and restore operations related to function calling and returning. For example, if the processor has a single shadow register set, a base level function can make a call to a first level function, and can perform a register bank switch so that the first level routine can save the registers and then restore them in single cycles. The problem is that this capability only exists for a single level of function calling. If the first level function were to call a second level function, both the primary and the shadow register sets would be occupied, requiring multiple cycles for all register save and restore operations related to subsequent function calls. Again, adding more shadow register sets extends the number of levels of function calls that can be supported with single-cycle register store and restore operations, but only at the price of a significant amount of silicon area Moreover, the software becomes complicated due to the need to keep track of the current level of function nesting. For these reasons, shadow registers find limited use in function calling.
Register window systems are an extension of the shadow register concept and are designed to accelerate function calling. In a register window system, a group of shadow register sets is typically arranged as a circular buffer. When a function call is made, the active register set advances from one set to the next in the circular buffer. When the buffer wraps, an overflow is said to occur. Upon an overflow, a sequence of memory transfers is needed to save the first register file in the circular buffer arrangement to insure it does not get overwritten. As verified by the analysis of execution patterns of large numbers of benchmark programs, by making the circular buffer deep enough, usually on the order of twelve to sixteen register sets, the overhead associated with register save and restore operations can be made to be negligible.
Prior art register window systems have many drawbacks and are thus not used in most modem high performance processor designs. First of all, a twelve-level to sixteen-level deep register window system requires an excessive amount of silicon area. Secondly, as the total number of registers in the circular buffer of register files increases, the number of register address lines, and hence the amount of time needed for register address decoding increases. Longer register address access times lead to slower system clocks and thus slower overall processors. Thirdly, as the number of register sets in the windowing system increases, the number of registers that must be saved when a task switch occurs increases proportionally. This adds a significant overhead to task switching and adds interrupt latency. Adding multiple copies of shadowed register window systems to provide single-cycle task switching would require an enormous amount of silicon area and would have the same limitations relating to shadow registers as discussed above.
The problems become more severe in DSPs. For example, in machines such as the SPARC, the floating point registers are not included in a register window switch. Rather, floating point registers must be loaded and saved under program control. This would not be acceptable on a floating point DSP. The reason the floating point registers are not added to the register window on the SPARC is because, unlike on floating point DSPs, the floating point registers are not used as widely. DSPs also often contain the ALU core registers as well as address registers and possibly other types of auxiliary registers that would need to be added to the register windowing system. Modem load-store VLIW DSPs have multiple register sets that would need to be windowed multiple times to create an effective register window system. Hence, it can be seen that register windows become prohibitively expensive to implement with most DSP architectures.
In U.S. Pat. No. 3,781,810, a system is disclosed to speed up the storing and the retrieving of registers when the machine context must be switched in a nested fashion. Upon the occurrence of an interrupt, when a "store" command is issued, a selected subset of the register set is transferred to an auxiliary register set simultaneously via parallel data paths. The data in the auxiliary registers are then transferred to the memory in the background by using otherwise unused memory transfer cycles. If another store command is issued prior to the background transfer, the transfer is allowed to complete in the foreground. Register restore operations are processed similarly. The auxiliary registers are restored in the background, and are then transferred simultaneously into the primary register set. This technique has drawbacks that limit performance. For example, in the register store operation, the auxiliary register set must be overwritten with the contents of the currently active register set. This means that the auxiliary register set cannot supply useful information to be used in the task switch. It would be more effective to provide a system whereby the current register set could be transferred out to memory, and context could be switched to a shadow register set in a single cycle. Instead of the auxiliary register set being filled with the data to be transferred out to memory, it would be desirable to allow it to be preloaded with useful information to enable a truly nested single cycle task switching capability. The disclosure in U.S. Pat. No. 3,781,810 only allows for single directional transfers at a given time and needs to be extended to support store-and-load operations, delayed interrupts, and various methods for accelerated task switching disclosed herein.
A shadow register system is disclosed in U.S. Pat. No. 5,327,566. In this system, a SAVE command is issued to cause the processor to latch the register contents into a shadow register set. A RESTORE command is used to cause the processor to latch the previously saved register contents from the shadow register set back to the primary register set used by the processor. Also, in one aspect of the disclosure of U.S. Pat. No. 5,327,566, when an interrupt is detected, the processor automatically latches the register contents into the shadow register set, and, when a return from interrupt instruction is issued, the processor automatically restores the register contents. No interrupt nesting is supported by the system of U.S. Pat. No. 5,327,566. That is, if a program is interrupted by a first interrupt service routine which is then interrupted by a second interrupt service routine, the register context of the original program will be destroyed and will be unrecoverable. The concept of automatic register saving in response to interrupts needs to be expanded to support nested interrupts.
Therefore, it is a primary object of this invention to provide improved systems for register shadowing and register windowing. It is desired to implement a minimal number of register sets in a circularly buffered configuration to provide higher performance register shadowing and windowing systems at a fraction of the cost of prior art systems.
Another objective is to provide an architecture to allow data to transfer between the register set and the memory so that register set store and load operations can proceed concurrently with normal processing in advance of being needed.
Another objective of the invention is to provide improved methods for task switching in processors employing the inventive register shadowing and windowing systems. Another objective is to provide new interrupt modes that perform register set store and load operations automatically without incurring program cycle overhead.
Another objective of the invention is to provide a register shadowing system for VLIW and superscalar processors that include multiple register sets.
Another objective is to provide a register windowing system with a much lower silicon area requirement and to provide a method to accelerate task switching with this system.