This invention relates in general to circuitry within a processor for generating a linear address, and in specific to a parallel design of a linear address generator (LAGEN) that is capable of high-speed linear address generation.
A linear address generator (xe2x80x9cLAGENxe2x80x9d) is typically implemented in processors of the prior art, such as the Merced processor developed by Intel Corporation in partnership with Hewlett Packard. An exemplary function of a LAGEN is to generate a memory address for data memory load in a processor to enable support for both IA-64 mode and x86 mode of operation, which are well-known modes of operation in the prior art. Prior art LAGENs typically require at least two clock cycles to complete (i.e., to generate a resulting linear address). An example of a typical LAGEN implementation of the prior art is illustrated in FIG. 1. As shown in FIG. 1, LAGEN circuitry 100 is implemented within a processor, and such LAGEN circuitry 100 typically allows for both 32-bit mode operation and 16-bit mode operation. 32-bit mode operation utilizes a 32-bit memory address, while 16-bit mode operation utilizes a 16-bit memory address. For instance, 16-bit mode operation is commonly utilized to perform x86 instructions that are 16 bits. LAGEN circuitry 100 includes LATCHES 10, 12, and 14, which latch operands IMM[31:0], SRC1[31:0], and SRC2[31:0], respectively. In FIG. 1, IMM[31:0] (i.e., xe2x80x9cimmediate addressxe2x80x9d), SRC1[31:0] (i.e., xe2x80x9csource 1xe2x80x9d), and SRC2[31:0] (i.e., xe2x80x9csource 2xe2x80x9d) represent input operands used to compute the memory address. LAGEN circuitry 100 further includes an adder, such as the 32-bit adder 16, to add the first two operands, SRC1[31:0] and IMM[31:0] of FIG. 1. Adder 16 produces a result having the higher 16 bits, shown as effective address (EA) EA[31:16], and the lower 16 bits, shown as EA[15:0], separated.
Thereafter, the higher 16 bits (i.e., bits EA[31:16]) are ANDed with a mode_32 control bit in the AND gate 18 to obtain an intermediate result. When the mode_32 control bit is set to 1 (high), the higher 16 bits EA[31:16] are passed as the output of AND gate 18. That is, when 32-bit mode operation is enabled, the circuitry passes the higher 16 bits EA[31:16] through AND gate 18 where they are combined with the lower 16 bits EA[15:0] resulting in an intermediate result of {EA[31:16], EA[15:0]}. On the other hand, when 32-bit mode operation is disabled (meaning 16-bit mode is enabled), the mode_32 control bit is set to 0. Accordingly, the output of AND gate 18 will be zero for each of the higher 16 bits. That is, AND gate 18 zeroes out all of the higher 16 bits and leaves the lower 16 bits EA[15:0] undisturbed, resulting in an intermediate result of {031, 030, 029, . . . 016, EA[15:0]}. It should be recognized that regardless of whether 32-bit mode or 16-bit mode is enabled, the lower 16 bits resulting from adder 16, i.e., EA[15:0], are utilized for the intermediate result.
This intermediate result is then added with the operand SRC2[31:0], which has been latched in latch 14. As shown in FIG. 1, adder 20 is used to add the intermediate result and SRC2[31:0] to generate a final result, i.e., the final linear address LA[31:0]. Thus, the prior art typically utilizes a 2-series, 32-bit addition implementation as described above in conjunction with FIG. 1. That is, the prior art typically requires first adding the SRC1[31:0] and IMM[31:0] operands to obtain an intermediate result, and then the intermediate result is modified with AND gates for either 32-bit mode or 16-bit mode operation. Thereafter, a second addition is performed, in which the intermediate result is added to the SRC2[31:0] operand to generate the linear address.
Prior art LAGEN implementations are problematic because they require a relatively lengthy time to generate the final linear address LA[31:0]. For example, as shown in FIG. 1, such an implementation may require well over 800 picoseconds to obtain the final linear address LA[31:0]. For instance, in the exemplary prior art implementation illustrated in FIG. 1, latching the operands into latches 10, 12, and 14 requires approximately 150 picoseconds, executing adder 16 requires approximately 350 picoseconds, executing AND gate 18 requires approximately 150 picoseconds, and executing adder 20 requires approximately 350 picoseconds. Therefore, such prior art implementation requires a total time of approximately 1000 picoseconds (or 1 nanosecond) to generate a linear address LA[31:0]. With increasing clock speeds (i.e., processor speeds), it becomes impossible to generate a linear address in a timely manner utilizing a prior art implementation. For example, it becomes impossible to generate a linear address within a single clock cycle utilizing prior art LAGEN implementations. For instance, when operating at a 1 gigahertz (Ghz) clock speed (i.e., 109 cycles per second), the LAGEN must be capable of generating a linear address within 1000 picoseconds (1 nanosecond) in order to complete within a single clock cycle. Because network bypassing, as well as other tasks that are typically required, the LAGEN circuitry may be required to complete substantially faster than 1 nanosecond to generate a linear address within 1 clock cycle of a 1 GHz clock. An example of one task that may need to be performed, thereby requiring the LAGEN to complete very fast, is a check on the memory address limit to determine if the linear address is valid. Prior art LAGEN implementations are unable to achieve a result in such a timely manner.
It is very desirable to provide a single-cycle LAGEN to enable a much simpler design in the first level cache (i.e., L0 cache), as well as a much simpler design for a 16-bit mode translator recorder IVE, which may be implemented within a processor. Generally, IVE is a major block of circuitry that enables 16-bit mode and 8-bit mode x86 instructions to be executed within IA-64 architecture processors, such as the Merced CPU from Hewlett-Packard Company. Prior art processors, such as Merced, typically include a two-cycle LAGEN and an out-of-order IVE. Generally, an out-of-order IVE is much more complex in design, consumes a larger amount of surface area (e.g., silicon), has increased routing congestion, and requires more effort to verify functional correctness and electrical reliability, than for an in-order IVE. Accordingly, it is desirable to have a single-cycle LAGEN to enable an in-order IVE for a processor. Furthermore, a single-cycle LAGEN would generally allow for faster accesses to the first-level cache. In general, a LAGEN generates linear addresses for all on-chip cache access, e.g., access to level 1 (xe2x80x9cL0xe2x80x9d), level 2 (xe2x80x9cL1xe2x80x9d), etc.). A single-cycle LAGEN would not only enable faster cache access, but would also facilitate a simpler cache structure. With a single-cycle LAGEN, an in-order IVE may be implemented, which is generally less complex and much smaller than an out-of-order IVE, which is typically required for prior art, two-cycle LAGENs.
In view of the above, a desire exists for a high-speed LAGEN for generating a linear address within a processor in a timely manner. A further desire exists for a high-speed LAGEN that is capable of generating a linear address in substantially less than 1 nanosecond. For instance, a desire exists for a high-speed LAGEN that is capable of generating a linear address in less than 0.5 nanosecond. Yet a further desire exists for a high-speed LAGEN that includes linear address generation and a bypass network, and is capable of generating a linear address within a single clock cycle of a clock operating at 1 GHz.
These and other objects, features and technical advantages are achieved by a high-speed LAGEN. That is, a high-speed LAGEN is disclosed, which generates a linear address very quickly. In a preferred embodiment, the high-speed LAGEN has a parallel design, as opposed to the serial design of prior art LAGENs. Such a parallel design allows for a much faster generation of a linear address. In a most preferred embodiment, the LAGEN generates a linear address within a single clock cycle when the clock is operating at 1 GHz. Accordingly, in a most preferred embodiment, the LAGEN is a single-cycle LAGEN capable of generating a linear address within a single clock cycle for high-speed processors (e.g., processors operating at 1 GHz). Although, it should be recognized that in alternative embodiments the LAGEN may be implemented in processors having such fast clock speeds that the LAGEN cannot generate a linear address within a single clock cycle (but may still produce a linear address very quickly), and any such embodiment is intended to be encompassed by the present invention.
In a preferred embodiment, the single-cycle LAGEN first compresses three operands (i.e., IMM[31:0], SRC1[31:0], and SRC2[31:0]) into two operands, and the two operands are then added by a 32-bit adder to generate a linear address (i.e., LA[31:0]). In a preferred embodiment, a Carry-Save-Adder (CSA) array is implemented to compress the three operands into two operands, which are then latched. That is, the three operands are fed to a CSA array, which generates two operands (i.e., a sum and a carry). Thereafter, a preferred embodiment utilizes a 32-bit adder to sum the two operands (i.e., the sum and carry) to generate a result, i.e., res[31:0]. It should be understood that the lower 16 bits of such result may be indicated as res[15:0] and the higher 16 bits of such result may be indicated as res[31:16].
In a preferred embodiment, the LAGEN allows for both 32-bit mode operation and 16-bit mode operation. In either mode of operation the lower 16 bits of the result, res[15:0], (generated by adding the sum and carry as described above) are utilized for the lower 16 bits of the generated linear address, i.e., for LA[15:0]. A preferred embodiment is configured to utilize the higher 16 bits of the result, res[31:16], for the higher 16 bits of the generated linear address, i.e., for LA[31:16], when 32-bit mode operation is enabled.
On the other hand, when 32-bit mode operation is not enabled (meaning that 16-bit mode operation is enabled) a preferred embodiment does not utilize the higher 16 bits of the result, res[31:16], for the higher 16 bits of the generated linear address, LA[31:16]. Rather, a preferred embodiment utilizes the higher 16 bits of the SRC2 operand for the higher 16 bits of the generated linear address, LA[31:16]. More specifically, a preferred embodiment determines whether a carry out bit was carried from bit 15 to bit 16 of the SRC2 operand when the EA[15:0] (i.e., the sum of IMM[15:0] and SRC1[15:0]), and SRC2[15:0] operands are added. That is, a preferred embodiment determines whether a carry out bit is generated when adding the EA[15:0] and SRC2[15:0] operands. In a preferred embodiment, such determination of a carry out is made in parallel with the operation of the above-described adder generating the result res[31:0], such that the determination is made in a timely manner. If such a carry out bit is not generated, then the higher 16 bits of operand SRC2, SRC2[31:16], are utilized for the higher 16 bits of the generated linear address, LA[31:16], for 16-bit mode operation. If, however, such a carry out bit is generated, then the higher 16 bits of operand SRC2, SRC2[31:16], incremented by 1 (i.e., incremented by the carry out bit) are utilized for the higher 16 bits of the generated linear address, LA[31:16], for 16-bit mode operation. In a preferred embodiment, the higher 16 bits of operand SRC2, SRC2[31:16], may be incremented by 1 in parallel with the operation of the above-described adder generating the result res[31:0], such that SRC2[31:16] incremented by 1 is available for use as the 16 higher bits of the generated linear address, LA[31:16], if necessary, in a timely manner.
Accordingly, in a preferred embodiment, a parallel design is utilized for a LAGEN, which results in a high-speed LAGEN, as opposed to the slower, serial design LAGENs implemented in the prior art. For instance, in a most preferred embodiment, the LAGEN is capable of generating a linear address within a single clock cycle for a clock speed of 1 GHz. It should be appreciated that a technical advantage of one aspect of the present invention is that a high-speed LAGEN is provided that is capable of generating a linear address within a processor in a timely manner. It should be further appreciated that a technical advantage of one aspect of the present invention is that a high-speed LAGEN is provided that is capable of generating a linear address in substantially less than 1 nanosecond. Furthermore, in a most preferred embodiment, a high-speed LAGEN is provided that is capable of generating a linear address in less than 0.5 nanosecond. It should also be appreciated that a technical advantage of one aspect of the present invention is that a high-speed LAGEN is provided that is capable of generating a linear address within 1 clock cycle of a 1 GHz clock. Still a further advantage of one aspect of the present invention is that the high-speed LAGEN allows for fast access to cache and a simple design of cache and enables an in-order IVE implementation within the processor.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.