A typical modem computer system includes a microprocessor, memory, and peripheral computer resources, e.g., monitor, keyboard, software programs, etc. The microprocessor has, among other components, arithmetic, logic, and control circuitry that interpret and execute instructions necessary for the operation of the computer system. FIG. 1 shows a typical computer system. The computer system has a microprocessor (20) that has a central processing unit (xe2x80x9cCPUxe2x80x9d) (22), a memory controller (also known and referred to as a xe2x80x9cload/store unitxe2x80x9d) (24), and on-board, or level 1 (xe2x80x9cL1xe2x80x9d), cache memory (26). The microprocessor (20) is also connected to main memory (30) and an external, or level 2 (xe2x80x9cL2xe2x80x9d), cache memory (28), both of which typically reside outside of the microprocessor (20).
In performing the various operations of the computer system, the microprocessor interprets and executes instructions provided by the computer system""s users and/or computer programs. The execution of the microprocessor is carried out by the CPU (22). Data needed by the CPU (22) to carry out an instruction are fetched by the memory controller (24) and loaded into internal registers (32) of the CPU (22). Upon command from the CPU (22), the CPU (22) searches for the requested data in the internal registers (32). If the requested data is not available in the internal registers (32), the memory controller (24) searches for the requested data in the on-board cache memory (26). If that search turns out to be unsuccessful, the memory controller (24) then searches for the requested data in the external cache memory (28). If that search also turns out unsuccessful, the memory controller (24) retrieves the requested data from the slowest form of memory, the main memory (30).
The internal registers of the CPU are formed by a plurality of register files (xe2x80x9cRFsxe2x80x9d) (not shown in FIG. 1). Register files are an integral part of a microprocessor because they are the local most memory available to the CPU. Typically, requested data that is in the on-board cache (26) or external cache (28) becomes available to the CPU three or more clock cycles after the cycle in which the CPU made the data request. However, requested data that is in the internal registers become available to the CPU during the same cycle or the first cycle after the cycle in which the CPU made the data request. Therefore, the speed and performance of the register files is a significant factor in determining the overall speed and performance of the microprocessor, and, in turn, the computer system.
Register files are generally arranged in one or more memory arrays. A memory array is a structure in which a plurality of memory elements are arranged such that data in each memory element can be accessed by selection of a particular pair of bit lines that used to read data from the memory array. FIG. 2 shows a typical memory array (44). In the memory array (44), data carrying wires that run column-wise through the memory array (44) are called bit lines (38). Data carrying wires that run row-wise through the memory array (44) are called word lines (40). Because a particular memory element (42) is connected to a distinct pair of a word line and a bit line, the size of the memory array (44), i.e., the maximum number of memory elements (42) that can be indexed and stored in the memory array (44), is equal to the number of word lines multiplied by the number of bit lines.
Selecting a particular memory element (42) occurs through a selection process known as xe2x80x9cdecoding.xe2x80x9d Decoding takes place through the use of a logical circuit known as a decoder. FIG. 2 shows a row decoder (36) and a column decoder (34). The row decoder (36) is used to select a word line (40) of the memory element (42) to be selected from the memory array (44). The column decoder (34) is used to select the bit line (38) of the memory element (42) to be selected from the memory array (44). As shown in FIG. 2, the memory element (42) to be selected is located at an intersection of the selected bit and word lines (38, 40). Once a particular memory element (42) is selected, a sense amplifier (46) senses, i.e., prospectively reads, the value stored in the particular memory element (42) and outputs the value to a data requesting component (not shown). From the foregoing discussion, it is apparent that as memory array sizes get larger to accommodate increased memory needs, the amount of time it takes to select a particular memory element from a memory array and the complexity of the selection process increases.
The row and column decoders (36, 34) shown in FIG. 2 select a word line (40) or bit line (38) based on address inputs applied to inputs of the row and column decoder (36, 34). The respective decoders then, through the decoding process, use the address to logically determine which particular word or bit line to select, i.e., activate. The decoding process within the row and column decoders (36, 34) is typically broken into two stages: a pre-decode stage and a final decode stage.
FIG. 3 shows a typical decoding process of a decoder (50) having a pre-decode stage (52) and a final decode stage (54). The decoder (50) in FIG. 3 uses a 5-bit address, and therefore, the decoder (50) may be used to select among 25, or 32, address lines. The pre-decode stage (52) is used to partially decode an address input, where after, the final decode stage (54) completes the decoding of the partially decoded value and selects the appropriate address line.
Still referring to FIG. 3, the pre-decode stage (52) and final decode stage (54) are constructed from AND gates operatively wired to an address input (shown in FIG. 3 as a4a3a2a1a0) and an address line (shown in FIG. 3 as 1 less than 31:0 greater than ) of a memory array (not shown). As mentioned above, depending on the values of a4, a3, a2, a1, and a0, the address input can represent any one of 25, or 32, address values.
Specifically, the pre-decode stage (52) is formed by 3-input AND gates (also referred to as xe2x80x9c3-input pre-decodersxe2x80x9d) (56) and 2-input AND gates (also referred to as xe2x80x9c2-input pre-decodersxe2x80x9d) (58) and the final decode stage (54) is formed by 2-input AND gates (also referred to as xe2x80x9cfinal decodersxe2x80x9d) (60). As shown in FIG. 3, if the a4a3a2 address bits are combined with 3-input pre-decoders (56), then eight 3-input pre-decoders (56) are needed, one for each of the eight possible bit combinations, e.g., 000, 001, 010, . . . , 111. Similarly, if the a1a0 address bits are combined with 2-input pre-decoders (58), then four 2-input pre-decoder gates are needed, one for each of the four possible bit combinations, e.g., 00, 01, . . . , 11. In the case where a pre-decode stage (52) uses two sets of gates, the final decode stage (54) ANDs the two sets of gates in the pre-decode stage (52). Hence, the final decode stage (54) uses 32 (25) 2-input final decoders (60), one for each of the 32 possible bit combinations, e.g., 00000, 00001, 00010, . . . , 11110, 111111.
Each of the eight 3-input pre-decoders (56) in the pre-decode stage (52) drives four final decoders (60) in the final decode stage (54). Accordingly, each of these eight 3-input pre-decoders (56) has a load of 4X, where X represents the load of one final decoder (60) in the final decode stage (54). Each of the four 2-input pre-decoders (58) in the pre-decode stage (52) drives eight final decoders (60) in the final decode stage (54). Accordingly, each of these four 2-input pre-decoder (58) has a load of 8X, where X represents the load of one final decoder (60).
FIG. 4 shows a representation of how the logic circuitry of the pre-decode stage (52) shown in FIG. 3 is connected to the logic circuitry of the final decode stage (54) shown in FIG. 3. As mentioned above, the 3-input pre-decoders (56) each drive four final decoders (60), and hence, each 3-input pre-decoder (56) has a gate load of 4X. Each 3-input pre-decoder (56) drives lumped group of four final decoders along xe2x85x9 of the height of a memory array (62). Further, as mentioned above, the 2-input pre-decoders (58) each drive eight final decoders (60), and hence, each 2-input pre-decoder (58) has a gate load of 8X. Each 2-input pre-decoder (58) drives the wire along the height of the memory array (62) because a value of each 2-input pre-decoder (58) repeats at every fourth address (as shown in FIG. 4 as pattern a1a0=00). Hence, each 2-input pre-decoder (58) has a gate load of 8X in addition to the load of wire needed to travel the height of the memory array (62). Table 1 shows the final driver load, distribution, use of pre-decode wire, and allowed pre-decode placement in relation with the address combination discussed above.
As evidenced from Table 1 and the preceding discussion, there is a need for a pre-decode stage that places less load on particular gates and wires in the pre-decode stage. Meeting such a need will help increase computer system efficiency and performance.
According to one aspect of the present invention, an address decoder for a memory array comprises a pre-decode stage comprising logic circuitry adapted to partially decoding an address input and a final decode stage comprising additional logic circuitry adapted to further decoding the partially decoded address input and selecting a address line within the memory array, where the logic circuitry of the pre-decode stage is disposed in between ends of the final decode stage.
According to another aspect, a method for positioning circuitry of a decoder comprises determining a first combination of bits of an address input to an address decoder of a memory array and configuring pre-decode logic circuitry of the address decoder such that the first combination of bits corresponds to a first set of logic, wherein the determination of the first combination of bits is made so as to allow the first set of logic to be centrally positioned with respect to a length of the memory array.
According to another aspect, a method for decoding an address in order to select a memory element of a memory array comprises inputting a first combination of one or more address bits to a first set of logic gates, inputting at least one other combination of one or more address bits to another set of logic gates, and outputting signals from the first set of logic gates and the another set of logic gates to logic gates in a final decode stage, where the first set of logic gates and the another set of logic gates are positioned in between the logic gates in the final decode stage.
According to another aspect, an address decoder comprises pre-decode logic comprising a first logic gate and another logic gate, and final decode logic comprising a set of logic gates operatively connected to the pre-decode logic, where the first logic gate is positioned in the pre-decode logic such that signals outputted from the first logic gate are driven over up to half of the set of logic gates in the final decode logic.
Other aspects and advantages of the invention will be apparent from the following description and the appended claims.