1. Field of the Invention
The present invention relates to digital computing, and more particularly to address generation in a computer's processor.
2. Description of the Related Art
A typical computer instruction must indicate not only the operation to be performed but also the location of the operand(s). An important function in microprocessors is the generation of operand addresses for load and store instructions. In RISC microprocessors, these addresses are typically determined by adding a pair of general purpose registers together, or by adding a constant displacement value contained in the load or store instruction itself to a general purpose register. Less common is the case where an address is filly indicated either by a register or a displacement and no addition is thus required. This less common case is treated as an addition where one of the addends is zero. In all of these cases, a single addition is performed, and the length of time to determine the address is constant.
This constant latency address generation is a feature of RISC processors but is not a feature of CISC processors. CISC processors support much more complex address generation, where more than two values are summed to obtain the operand address. One popular commercial architecture is the x86 architecture embodied in CPU's manufactured by the Intel Corporation of Santa Clara, California. In the x86 family of microprocessors, it is possible to add the contents of four registers in addition to a constant displacement from the load and/or store instruction. This allows the possibility of adding five 32-bit operands in order to obtain the operand address. Also complicating the process of address generation is the fact that one of the five operands, the index, can be multiplied by 1, 2, 4, or 8. These factors result in a more complex form of address generation than the constant latency approach of RISC processors. This additional complexity results in a slower operand addressing unit For example, if it is possible to add two 32-bit numbers in three levels of logic, it may require five levels of logic to add five 32-bit numbers.
In addition to CISC architecture and instruction sets, another feature available on some processors is pipelining. By placing latches at feed-forward cutsets, pipelining can increase the clock rate of the system. A latch refers to a storage unit or buffer or register. A cutset is a minimal electrical subnetwork, the removal of which cuts the original electronic network into two connected pieces. One effect of placing latches within a system to achieve the benefits of pipelining is that the system latency may be increased in certain situations.
The five components that may be added to generate an address within a CISC processor are the four register components plus the displacement value from the load or store instruction. The first register component relates to the feature of typical microprocessor systems that uses paging and segmentation of memory. This first register component is the base address value of the segment where the operand lies, called the segment_base. The other register components are the base, index, and bit_offset values. The bit_offset component is typically scaled by a constant value of 1/8th. Accordingly, the bit_offset component is typically shifted to the right by three bit positions to obtain a byte address. This results in the bit_offset component containing only 29 useable bits. The bits shifted out on the right are disposed of for the purposes of address generation.
CISC processors, including those of the x86 family, do not always add all five components. In fact, the most frequent operation is the addition of just two or three operands. Past designs of the x86 architecture have exploited fixed-latency operand address generators, which pay a time penalty in cases where the full generation complexity is not required. Such generators do not derive a benefit from the increased sample rate possible within pipelined systems. What is desired is an address generation unit that can take advantage of the nature of pipelined processing to allow address generation to take one cycle of latency in the simple cases, and two cycles of latency in the more complex cases. Due to the nature of pipelining microprocessors, the effect of the latency is to allow bypassing of results more quickly to following dependent operations, while the actual number of stages of the address generator is two for both simple and complex address combinations.