1. Field of the Invention
The present invention relates in general to microprocessors and, more particularly, to a system, method, and mechanism providing low latency access to control registers in a pipeline processor.
2. Relevant Background
Computer programs comprise a series of instructions that direct a data processing mechanism to perform specific operations on data. These operations including loading data from memory, storing data to memory, adding, multiplying, and the like. Data processors, including microprocessors, microcontrollers, and the like include a central processing unit (CPU) comprising one or more functional units that perform various tasks. Typical functional units include a decoder, an instruction cache, a data cache, an integer execution unit, a floating point execution unit, a load/store unit, and the like. A given program may run on a variety of data processing hardware.
Early data processors executed only one instruction at a time. Each instruction was executed to completion before execution of a subsequent instruction was begun. Each instruction typically requires a number of data processing operations and involves multiple functional units within the processor. Hence, an instruction may consume several clock cycles to complete. In serially executed processors each functional unit may be busy during only one step, and idle during the other steps. The serial execution of instructions results in the completion of less than one instruction per clock cycle.
As used herein the term xe2x80x9cdata processorxe2x80x9d includes complex instruction set computers (CISC), reduced instruction set computers (RISC) and hybrids. A data processor may be a stand alone central processing unit (CPU) or an embedded system comprising a processor core integrated with other components to form a special purpose data processing machine. The term xe2x80x9cdataxe2x80x9d refers to a digital or binary information that may represent memory addresses, data, instructions, or the like.
In response to the need for improved performance several techniques have been used to extend the capabilities of these early processors including pipelining, superpipelining, and superscaling. Pipelined architectures attempt to keep all the functional units of a processor busy at all times by overlapping execution of several instructions. Pipelined designs increase the rate at which instructions can be executed by allowing a new instruction to begin execution before a previous instruction is finished executing. A simple pipeline may have only five stages whereas an extended pipeline may have ten or more stages. In this manner, the pipeline hides the latency associated with the execution of any particular instruction.
The goal of pipeline processors is to execute multiple instructions per cycle (IPC). Due to pipeline hazards, actual throughput is reduced. Pipeline hazards include structural hazards, data hazards, and control hazards. Structural hazards arise when more than one instruction in the pipeline requires a particular hardware resource at the same time (e.g., two execution units requiring access to a single ALU resource in the same clock cycle). Data hazards arise when an instruction needs as input the output of an instruction that has not yet produced that output. Control hazards arise when an instruction changes control information, such as the program counter (PC), because execution cannot continue until the target instruction from the new PC is fetched.
When hazards occur, the processor must stall or place xe2x80x9cbubblesxe2x80x9d (e.g., NOPs) in the pipeline until the hazard condition is resolved. This increases latency and decreases instruction throughput. As pipelines become longer, the likelihood of hazards increases and the latency penalty paid to handle hazards increases. Hence, an effective mechanism for handling hazard conditions is important to achieving the benefits of deeper pipelines.
Control registers are used to hold scalar state information that controls the execution of instructions in the processor. They are accessed through a control register file. The resources committed to storing this information are sometimes referred to as the xe2x80x9ccontrol spacexe2x80x9d. The size and number of control registers in a processor is largely implementation independent. Because control space is accessed at every context switch, there must be a uniform, low latency mechanism for accessing (e.g., reading from and writing to) the control registers.
A control space access instruction includes a first field containing a source register specifier and a second field containing a destination register specifier. In the case of a read instruction, the source register specifier addresses a control register while the destination register specifier addresses a general purpose register. In the case of a write operation, the source register specifier addresses a general-purpose register while the destination register specifier addresses a control register. A xe2x80x9cglobal operand address busxe2x80x9d is used to index into a register file to address a particular register. A xe2x80x9cglobal operand data busxe2x80x9d is used to communicate data values with the execution stages of a pipeline processor. The decode stage of the pipeline processor is configured to decode the first and second fields and place the decoded contents on the global operand address bus. While the instruction remains in decode the source operand addressed by the source register specifier of the first instruction field is read. The source operand is placed on the global operand data bus. In the meantime, the destination register specifier along with the instruction type is stored into a snapshot file to be used during a writeback stage. During the execution stage, the source operand passes through the execution unit, is optionally sign-extended, and is placed on a result bus. It is then latched into a pipefile at the end of the execution stage and written into the destination register specified by the second instruction field during the writeback stage using the saved destination register specifier in the snapshot file.
Briefly stated, the present invention involves a method for low latency access to the control space. A pipeline processor executes instructions in multiple stages including a decode stage, one or more execution stages, and a writeback stage. A control space access instruction includes a first field containing a control register specifier and a second field containing a general purpose register specifier. The decode stage is configured to decode the first and second fields and place the decoded contents on a global operand bus. The specified control register is addressed from the global operand bus while the access instruction is in decode. In the case of a read instruction, the addressed control register places its contents on the global operand bus while the instruction remains in decode. In the case of a write instruction, the general purpose register is addressed during the execution stage and its contents placed on the global operand bus during the writeback stage such that the contents of the addressed general purpose register are moved to the addressed control register during the writeback stage.
The present invention also involves a data processor having a plurality of execution pipeline stages where each stage accepts a plurality of operand inputs selected from a global operand bus and generates a result. A results bus distributes the generated results from each of the execution pipeline stages throughout the plurality of pipeline stages. A multiplexor associated with each execution pipeline stage operates to selectively couple the results bus to an operand input of the associated execution pipeline stages. . A control space access instruction is executed by identifying a first field containing a control register specifier and a second field containing a general purpose register specifier. The first and second fields are decoded and the decoded contents are placed on a global operand bus. The specified control register is addressed from the global operand bus while the access instruction is in decode. In the case of a read instruction, the addressed control register places its contents on the global operand bus while the instruction remains in decode. In the case of a write instruction, the general purpose register is addressed during the execution stage and its contents placed on the global operand bus during the writeback stage such that the contents of the addressed general purpose register are moved to the addressed configuration register during the writeback stage.
The foregoing and other features, utilities and advantages of the invention will be apparent from the following more particular description of a preferred embodiment of the invention as illustrated in the accompanying drawings.