Processors (e.g., microprocessors) are well known and used in a wide variety of products and applications, from desktop computers to portable electronic devices, such as cellular phones and PDAs (personal digital assistants). As is known, some processors are extremely powerful (e.g., processors in high-end computer workstations), while other processors have a simpler design, for lower-end, less expensive applications and products.
Further still, most processor architectures are register based. A register-based processor architecture utilizes a set of registers for carrying out various operations. In this regard, data values are moved from memory into registers and operations are performed on registers. For example, the following (commented) sequence of instructions may be performed to add the value of 8 to the contents of the data stored at memory location A238.                LOAD A, 8; (Comment: load register A with the value 8)        LOAD B, [A238]; (Comment: load register B with the data value at address A238)        ADD C, A, B; (Comment: add the contents of registers A and B and store the result in register CThe structure and operation of such register-based processors are well known, and need not be described herein.        
As is also known, many processor architectures maintain their register base in a location referred to as a register file. Further, many register files contain more individual registers than that which are available at any given time (or to any given instruction). That is, in many processor architectures, only a subset of registers are available (or visible) to any given instruction. This instruction availability may depend on a variety of factors, such as the current mode of operation. Thus, in some processors, individual registers may be identified through a combination of register-select bits and mode-identifying bits.
As is known, many processors have pipelined architectures to increase instruction throughput. In theory, scalar pipelined processors can execute one instruction per machine cycle (and more in super-scalar architectures) when executing a well-ordered, sequential instruction stream. This is accomplished even though an instruction itself may implicate or require a number of separate micro-instructions to be effectuated. Pipelined processors operate by breaking up the execution of an instruction into several stages that each require one machine cycle to complete. For example, in a typical system, an instruction could require many machine cycles to complete (fetch, decode, ALU operations, etc.).
Reference is made to FIG. 1, which is a block diagram illustrating the organization and flow of information in a pipelined processor capable of operating in a plurality of operating modes. In the pipelined architecture of FIG. 1, there is illustrated a decode stage 10, a read stage 20, an execute stage 30, a memory access stage 40, a retire stage 50, and a register write stage 60. The structure and operation of these various pipeline stages are known and understood by persons skilled in the art, and therefore need not be described herein. As is known, in the decode stage 10, circuitry from a decoder or decode logic decodes an encoded instruction and generates control signals for the circuitry of the processor to execute (or carry out) the decoded instruction. In the illustrated figure, there are six sets of signal lines 11 illustrated as passing from the decode stage 10 to downstream stages. Each group of signal lines uniquely identifies a register of the processor. As is known, although not specifically illustrated in FIG. 1, a processor includes a plurality of registers into which data from memory may be imported (or into which calculated data or results may be written). Frequently, a processor contains a number of physical registers, with only a subset of the registers accessible at any given time. In this regard, registers may be banked, such that registers from a given bank are available at a particular time. Alternatively, processors may be configured to operate in a plurality of modes, such that only a subset of the physical registers are available during a particular mode of operation.
As illustrated in FIG. 1, the decode stage 10 may include circuitry (in the form of a register or otherwise) having a plurality of bits that identify a register, and a second plurality of bits that identify a processor mode of operation. Alternatively, although not specifically illustrated, a second grouping of bits may be provided to identify a register bank (instead of a processor mode), for processors that have registers organized in groups or banks. In the illustrated embodiment, there are four signal lines 14 that operate as register-select signal lines, and five signal lines 15 that identify a mode of operation. Collectively, these make up nine signal lines (e.g., 12) that uniquely identify a processor register. The embodiment of FIG. 1 illustrates a processor having thirty-two physical registers, with only sixteen registers available at a given time.
In the embodiment illustrated in FIG. 1, there are six groups of signal lines 11 generated by the decode stage 10. Of course, there may be additional, or fewer, such groups of such signal lines. Each such group (e.g., 12 and 13) uniquely identifies a particular register. In the illustrated embodiment, four groups of these signal lines (e.g., 12) identify source registers, while two groups (e.g., 13) identify destination or target registers. The numbers of these groupings is, of course, processor dependent. Thus, in the embodiment illustrated in FIG. 1, certain instructions may implicate up to four source registers and two destination registers. Of course, for any given instruction, fewer than six registers may be implicated. In connection with instructions such as those implicating fewer than six registers, certain groups of the signal lines will simply be ignored by downstream stages of the processor pipeline.
Thus, the decode stage 10 operates to decode an instruction. In connection with this decode operation, source and destination registers are uniquely identified by a plurality of signal lines. In the illustrated embodiment, nine signal lines are used to uniquely identify each register in the embodiment of FIG. 1. These, or similar signal lines, are passed between each stage of the processor pipeline to identify processor registers as needed at each stage of the pipeline. Similar signal lines 70 are fed back from each downstream stage of the pipeline to the read stage 20, to accommodate data forwarding. As is known, data forwarding is a technique used to ensure that proper values are placed in processor registers. For example, if a processor instruction calls for the storage of a value of a given register to a certain memory location, and the value of that register has been changed by an immediately preceding instruction (but not yet written back to the register file), then the read stage of the pipeline 20 should read into the identified register the value from the downstream pipeline stage having the current value of that register, as opposed to reading the value of that register from the register file. In this regard, and as is known, the register file (not specifically illustrated in FIG. 1) is not updated until the register write stage 60. Prior to this time, if the read stage 20 of the pipeline requires a register value that exists in one of the intermediate pipeline stages 30, 40, 50, or 60, then the value should be read from that intermediate pipeline stage, and is so read through data forwarding lines 70.
In this regard, and as illustrated, the read stage 20 includes compare logic 22 for comparing the nine signal lines for identifying a register output from the decode stage 10 with comparable signal lines within the data forwarding path 70. If there is a match (indicating that the same register has been implicated and its current value is in a downstream stage of the pipeline), then data is read into that register (at the read stage 20) by a data forwarding path 70. If, however, no such register match is identified, then the value associated with the identified register is read in from the register file.
Unfortunately, the pipelined architecture illustrated in FIG. 1 is, in some respects, extremely complex and logic intensive. Specifically, comparisons for purposes accommodating data forwarding are cumbersome and complex. In a processor, for example, having only sixteen registers available or accessible at a given time, the comparisons required among the nine signal lines that uniquely identify a given register are excessively complex. This comparison 25 is carried out in any of a variety of manners. One way of carrying out this comparison is simply to do a straight comparison of all nine bits. When there is an exact match, then the associated register is identified as having an intermediate value in a downstream pipeline stage. The logic, however, required to perform nine-bit comparisons is significant. Another way in which this comparison may be performed is by comparing the four bits 14 that identify the register within a given bank or mode of operation. If and when there is a match on these four bits, then a second level comparison may be made of the five bits 15 that identify the register bank or processor mode of operation. Again, unfortunately, this requires excess logic and levels of complication in carrying out the comparison.
Accordingly, it is desired to provide an improved architecture for accessing processor registers and implementing data forwarding.