1. Field of the Invention
This invention relates to microprocessors, and more particularly, to the handling of general-purpose and extended register files.
2. Description of the Relevant Art
Since the introduction of the 8086 microprocessor, several successive generations of the X86 architecture have been developed, with further developments occurring on a continuous basis. With each new generation of the X86 architecture, microprocessor manufacturers have attempted to maintain backward compatibility in order to allow software developed for previous generations of the architecture to run on the most current generation. Maintaining this compatibility has forced a number of compromises in successive generations of the architecture.
An X86 microprocessor is referred to as a CISC (Complex Instruction Set Computing) machine, due the type of instruction set employed. The instruction set of the X86 includes a relatively large number of variable length instructions. A generic X86 instruction can include one to five prefix bytes, an operation code (opcode) field of one to two bytes, and addressing mode (Mod R/M) byte, a scale-index-base-byte, a displacement field and an optional immediate data field. The shortest X86 instructions are only one byte in length, and consist of a single opcode byte. These instructions can access standard, or general-purpose registers, to be discussed below, when executed by an X86 processor.
Nearly all microprocessor architectures, including the X86, feature a small, fast memory known as a register file, which is separate from system and cache memory. A register file is made up of a number of individual registers that are used for temporary storage during program execution. One of many typical uses of registers is the temporary storage of operands during arithmetic operations. Registers can be classified as general-purpose or dedicated. General purpose, or standard registers can store a number of different types of data, while dedicated registers have specific uses, and thus can only store certain, pre-designated data types.
Since the register file is located in the core of the microprocessor, accesses to it are typically much faster than accesses to main memory. Programs that are register intensive usually run significantly faster than an otherwise equivalent program that is main memory intensive. For this reason, it is advantageous to have a sufficiently large number of general-purpose registers. A significant weakness of the X86 architecture is the small number of general-purpose registers. While the X86 architecture includes many registers, a majority of these are dedicated to a particular use. The X86 architecture has only eight general-purpose registers that can be accessed by X86 instructions. Comparatively, typical RISC (Reduced Instruction Set Computing) microprocessors have thirty-two or more general-purpose registers. It would be desirable to add an extended file of general-purpose registers to the base X86 architecture in order to increase processor performance.
A primary consideration when adding registers to any microprocessor architecture, X86 included, is the interaction between the processor and the operating system. If an interrupt or exception occurs during program execution, the process employing the registers must be suspended. The register state must be saved to main memory where it can be retrieved once the process is allowed to resume. Control of the suspension, state save, and resumption of a process is typically performed by the operating system. Operating systems are programmed with a specific number of general-purpose registers in mind. Simply adding extra registers to the base architecture of a processor may not allow the operating system to save the register state of a suspended process using them. Reconfiguring the operating system to take advantage of additional registers can be very expensive and very time consuming. It would be desirable to add registers to the base architecture without changing the operating system. Such a method of adding registers should allow for the state of the additional registers to be saved whenever a process using them is suspended. Usage of the additional registers, including state saves, should be transparent to the operating system.
The problems outlined above are in large part solved by a system and method for transparent handling of an extended register context in accordance with the present invention. In one embodiment, a microprocessor includes an extended register file (ERF), which augments a general-purpose register file containing a limited number of registers. The ERF is mapped to a main memory region for context swaps, with the physical base address of the region stored in a base address register. The ERF also includes a status vector register for storing status bits. These status bits provide information about the state of the ERF. Additional instructions are added to the processor""s instruction set for operations involving the ERF, although the extended registers can be used with arithmetic and logical instructions that are already present in the processor""s instruction set. All operations involving the ERF are transparent to the operating system. ERF operations are instead handled by application software that is designed with the extended registers in mind.
In one particular embodiment, a general-purpose register receives and stores a virtual base address for the memory region that the ERF it to be mapped. This virtual address is issued by the application software. The virtual address is then translated into a physical address and stored in the base address register of the ERF. The ERF also contains a status vector register, which contains at least three status bits. One of these status bits is used to indicate whether the ERF is active. If the active bit is not set, then the ERF is available for use by any process of the application software. A second status bit, referred to as the state change status bit, when set, indicates that an interrupt or an exception has occurred. A third status bit, when set, enables the base address register to be snooped during L1 cache snoops. In effect, the base address register behaves as one additional cache entry when the snoop enable bit is set. This behavior is important for memory coherency and context swaps, as will be detailed further below.
When the ERF is accessed for the first time, the active bit is set, while the other two status bits remain in their reset state. The process that is accessing the registers will own that register space. Accesses to the ERF will be private, and thus the contents of the ERF will not be coherent with the memory region to which it is mapped. If an interrupt or exception occurs, both the state change and snoop enable bits will be set in the status vector register. However, the register state will remain in the ERF, and a context swap will occur only if, subsequent to an interrupt or exception, a new process requests access to the ERF. If such a context swap occurs, the ERF context for the original process is copied back to the main memory region to which it is mapped. The new process will then have access to the ERF, and is mapped to a different memory region. If, after and interrupt occurs, no other process accesses the ERF, the original process can resume use of these registers without having to reload.
When the snoop enable bit of the status vector is set, the ERF can be snooped during L1 cache snoops. Since the ERF is mapped to main memory on an even block boundary, a single snoop of the base address register will cover the entire ERF. If a hit occurs during a snoop, the contents of the ERF will immediately be copied back to the mapped main memory region, and the active bit will be reset. This ensures coherency between the mapped memory region and the ERF. At this point, the ERF may be used by another process, although the resetting of the active bit does not imply that it actually will be used by another process. If a new process needs register access, the ERF will be loaded from the memory region to which the new process is mapped.
When a process has finished using the extended registers, the ERF may be deactivated by one of two special instructions. One of these instructions merely resets the active bit, while the other instruction copies the ERF contents back to the mapped main memory region prior to resetting the active bit. It should be noted however, that the use of these instructions is optional. If the instructions are not used, the register file will continue to be snooped, unnecessarily, during L1 cache snoops.
Thus, in various embodiments, the system and method for transparent handling of extended register states provides the advantages of a microprocessor having extra registers. Application software that takes advantage of the ERF can be written to be more register-intensive, which may result in significantly greater execution speed. The context save of a process using the ERF is simplified by mapping the ERF to a block in main memory. Since use of the ERF is controlled by application software, there is no need to modify the operating system to make the extended registers architecturally visible.