1. Field of the Invention
The present invention relates, in general, to data processors, and, more particularly, to a method, apparatus and system for implementing context switching in processors with large register files.
2. Relevant Background
Computer systems include central processing units (CPUs), microcontroller units (MCUs), and the like coupled with memory. Programs that run on such computer systems operate on data that may be stored and retrieved by the program or supplied at run-time. Programs include a plurality of saved instructions that define particular operations that are to be performed on the data.
Most processor architectures generally define a plurality of registers for holding the data to be operated on by the program instructions. These registers may be implemented as hardware registers or as register files in general purpose memory. The registers store both the instruction and data that are being or may be used by the processor. The registers are usually implemented in memory devices that are closely coupled with the processor to provide low-latency access to data required by the processor. The registers are typically defined in the processor architecture specification and so are usually considered part of the processor architecture even where they are physically implemented in another device.
High speed processors may have tens or hundreds of registers in a general register file. The large number of registers can enable the processor to process a large amount of data concurrently and to load or store data from longer latency storage before it is needed. Very long instruction word (VLIW) processors tend to have higher register number requirements because of the inherent parallelism of VLIW that results in more concurrent operations. The higher register number requirements place correspondingly higher bandwidth and response time requirements on the memory bus that transfers data between memory and the registers. It can take multiple memory bus clock cycles to transfer data into or remove data from all the available registers in a processor.
As a more specific example, multi-tasking and multi-threading processor architectures enhance data processing efficiency in many applications. In such architectures, software programs executing on the processor are segmented into atomic xe2x80x9cthreadsxe2x80x9d that execute on the processor. To ensure architectural integrity, each thread is normally guaranteed access to the entire register set defined by the processor architecture even if the thread only uses a fraction of that register set.
A particular thread""s instructions and data, together with the architectural registers that store that data are referred to as a xe2x80x9ccontextxe2x80x9d. A xe2x80x9ccontext switchxe2x80x9d occurs when the architectural resources are switched from one thread to another thread. A context switch occurs, for example, when one thread become inactive or is terminated and the processor resources are applied to another active thread. A context switch also occurs, for example, when an executing thread accesses a resource that has a long latency, or when a thread with higher priority than the current thread is imposed on the processor. When a context switch occurs, the data in the registers is moved out of the registers and saved to persistent storage (or some other memory location). Data for the new context is then transferred into the registers.
One way to organize registers within a processor is to use a register windowing technique to access a plurality of registers in a register file. With register windowing, a register window has a predetermined number of contiguous registers, and the window can be moved linearly within the register file. At any one time, the register window permits program access to a subset of the total number of registers in the register file. Control registers are also associated with the register windows so that a program can manipulate the position of the window within the register file and monitor the status of the window.
For example, in the specification for a scaleable processor architecture, SPARC-V9, the general purpose registers for storing and manipulating data are arranged in register sets accessible through register windows, each register window having 32 registers. A particular processor can have multiple register sets ranging from three register sets to 32 register sets. Individual registers are addressable using a five-bit address in conjunction with a current window pointer (CWP) . The register window is movable within the register sets such that a program can logically address multiple physical registers in the register sets by simply tracking a logical register name or specifier and the current window pointer.
In prior implementations, the entire register file is purged in response to a context switch, and the register file is initialized for a new process. If the new process is itself a saved process, the register values are restored from storage before the context takes effect. Because of this, two memory operations, one to write the old context to storage and a second to read the new context from storage, may be required for each context switch. For VLIW architectures, this situation creates an undesirable number of memory transactions that constitute overhead to the fundamental data processing performance of the processor. For example, in an architecture providing 256 registers, up to 512 memory transactions may be required to implement a context switch. This setup may be required in prior systems even where only a few of the 256 registers were actually used by the current process and where only a few of the registers will be used by the new process. A need exists for a processor architecture that provides low overhead manipulation of a large register file.
Another limitation of existing processors is that during a context switch, all of the processors resources are dedicated to completing the context switch. Other operations are blocked until the new context is in place. This type of operation decreases the efficiency of the processor because every operation is stalled until all of the old context""s registers are saved (including registers that were not used) and all of the new context""s registers are initialized or restored (including registers that will not be used).
Briefly stated, the present invention solves these and other limitations by saving only registers that have been modified in response to a context switch. Further, during a context switch, the new context""s registers are dynamically loaded from its context record when the register is used. In this manner, no overhead penalty is incurred for registers that are architecturally specified, but not used by the thread. Also, the context saving process is performed in accordance with the present invention in parallel with other operations in the new context, minimizing the impact of context switching on processor performance.
In another aspect, the present invention involves a method for operating a processor including the steps of establishing a first register save area and a second register save area in a memory, where each register save area holds data values that define a context. The first context is loaded in the processor by loading at least some of the data values from the first register save area into the plurality of registers. A first pointer value to the first register save area is stored in a current RFSA register. A context switch is indicated by storing a second pointer to the second register save area in the current RFSA register. The first pointer is transferred from the current RFSA register to a previous RFSA register. All of the data values that define the first context are transferred from the registers to a shadow register file. The second context is established in the processor by loading selected data values from the second register file save area into the plurality of registers.