All narrative material in the appendices as filed on Feb. 11, 2000, were incorporated into the specification on pages 31-68 of the specification. All graphical illustrations in the appendices as filed on Feb. 11, 2000, were incorporated into the drawings in FIGS. 8, 9, 10A, 10B, 10C, 10D, 10E, 11A, 11B, 11C, 11D, 12A, 12B, 12C, 13, 14, 15, 16A, 16B, 16C, 17, 18, and 19.
The present invention relates to cross-bar circuits, and particularly relates to a cross-bar circuit implementing a switch of a broadband processor.
In digital processing systems, one of the basic operations is rearranging or copying portions of an operand. Prior art systems, such as current general purpose microprocessors (Pentium, MIPS, ARM, SPARC, PowerPC, etc.), implement this basic operation in various special circuits, such as shifters, rotators, field extractors, or byte permuters.
Digital processing is now being extended to new applications such as broadband processing in which very general rearrangements of bits are required to accomplish various sophisticated mathematical algorithms needed for encryption and error correction. Such digital processing requires switching circuits, cross-bar circuits, to help accomplish these algorithms. General switching requires a cross-bar circuit, which conventionally contains more transistors than the specialized shifters, rotator, etc. used in the prior art.
Prior art cross-bar circuits require far more bits of control to specify the mapping between the location of output operand bits and the input operand bits from which they are generated.
The present invention provides a cross-bar circuit that implements a switch of a broadband processor. The present invention describes a system and method for implementing switching operations that perform completely general permutations and copies at the individual bit level within large operands.
The present invention describes a system and method for reducing the transistor count, wire count, and area of a general cross-bar network down to levels comparable to what is achieved in more specialized circuits. This invention involves a novel combination of circuit techniques, including precharged dynamic multiplexors, ground selection, and pseudo-differential sensing against dummy bit lines, which have individually been practiced in digital logic and memory devices.
The present invention implements the system and method for utilizing wide operands further described in commonly-assigned U.S. patent application Ser. No. 09/382,402 to provide these control bits from a small cache memory physically located close to the cross-bar circuit.
The present invention also describes a system and method for utilizing the cross-bar circuits to perform conventional specialized operations as well. These operations have a great deal of redundancy in their control state. The present invention describes a system and method for generating the large number of control bits of the cross-bar circuit (e.g. typically more than 1,000 bits) from a much smaller number of control bits for conventional operations (e.g. typically fewer than 100 bits). This invention conserves the resources and delays that would otherwise be incurred in the use of the wide operand memory referred to in the previous paragraph.
The present invention describes a system and method to implement a cross-bar circuit, memory, and control of a switch of a broadband processor, that (1) requires (a) a small area of silicon, (b) very low power, (c) a small number of transistors, (d) a small number of wires, and (e) only two metal layers, (2) operates at a high speed, and (3) has high functionality.
In an exemplary embodiment, the present invention provides a cross-bar circuit that, in response to partially-decoded instruction information and in response to datapath information, (1) allows any bit from a 2n-bit (e.g. 256-bit) input source word to be switched into any bit position of a 2m-bit (e.g. 128-bit) output destination word and (2) provides the ability to set-to-zero any bit in said 2m-bit output destination word. The cross-bar circuit includes: (1) a switch circuit which includes 2m 2n:1 multiplexor circuits, where each of the 2n:1 multiplexor circuits (a) has a unique n-bit (e.g. 8-bit) index input, one disable input, and a 2n-bit wide source input, (b) receives (i) an n-bit index at the n-bit index input, (ii) a disable bit at the disable input, and (iii) the 2n-bit input source word at the 2n-bit wide source input, and (c) decodes the n-bit index either (i) to select and output as an output destination bit one bit from the 2n-bit input source word if the disable bit has a logic low value or (ii) outputs a logic low as the output destination bit if the disable bit has a logic high value; (2) a cache memory that (a) has 2m cache datapath inputs and 2m cache index inputs, (b) receives (i) the datapath information on the 2m cache datapath inputs and (ii) 2m n-bit indexes on the 2m cache index inputs, (c) provides a first set of the n-bit indexes for the switch circuit, and (d) includes a small tightly coupled memory array that stores p (e.g. eight) entries of 2m n-bit indexes for the switch circuit, where the cache memory is logically coupled to the switch circuit; and (3) a control circuit that (a) has a plurality (e.g. 100) of control inputs, (b) receives the partially-decoded instruction information on the plurality of control inputs, (c) provides a second set of the n-bit indexes for the switch circuit, and (d) provides the disable bits for the switch circuit, where the control circuit is logically coupled to the switch circuit and to the cache memory.
In an exemplary embodiment, the present invention provides a switch circuit that allows any bit from a 2n-bit (e.g. 256-bit) input source word to be switched into any bit position of a 2m-bit (e.g. 128-bit) output destination word and that provides the ability to set-to-zero any bit in the 2m-bit output destination word. The switch circuit includes 2m 2n:1 multiplexor circuits, where each of the 2n:1 multiplexor circuits (a) has a unique n-bit (e.g. 8-bit) index input and one disable input, (b) decodes an n-bit index received at the n-bit index input to select one bit from the 2n-bit input source word if a disable bit received at the disable input has a logic low value, and (c) outputs a logic low if the disable bit has a logic high value. Also, each of the 2n:1 multiplexor circuits includes: (1) a 2q:1 pass gate selector (e.g. 4:1 pass gate selector), where the 2q:1 pass gate selector has 2q precharge/discharge wire-OR bitline inputs; (2) a sense amplifier logically coupled to the 2q:1 pass gate selector, where the sense amplifier (a) receives the output of the 2q:1 pass gate selector and (b) receives a dummy bitline input to allow differential sensing of small swing signals on the wire-OR bitline inputs; and (3) 2n-q unit switch cells (e.g. 64 unit switch cells) per precharge/discharge wire-OR bitline input, where each of the unit switch cells (a) is logically coupled to a wire-OR bitline input of the 2q:1 pass gate selector, (b) is logically coupled to one of 2r (e.g. one of 8) active-LOW xe2x80x9cSELAxe2x80x9d select wires, and (c) is logically coupled to one of 2n-q-r (e.g. one of 8) active-HIGH xe2x80x9cSELCxe2x80x9d select wires.
In addition, an exemplary implementation of the present invention provides a control circuit that provides indexes and disable bits. The control circuit includes: (1) an arithmetic logic unit (ALU), where the ALU includes a plurality of ALU modules; (2) a plurality of |multiplexors logically coupled to the ALU, where each of the multiplexors includes a plurality of multiplexor modules; and (3) a plurality of decoders logically coupled to the ALU and to the plurality of multiplexors, where each of the decoders includes a s-stage (e.g. 5 stage) chain of NAND/NOR gates.
Some of the advantage of the cross-bar circuit are as follows:
1. It requires a uniquely small area of silicon, as provided by the arrangement of the switch circuit, the cache memory, and the control circuit;
2. It requires uniquely low power, as provided by the use of a minimum number of logic gates to implement the switch circuit and the control circuit;
3. It requires a small number of transistors, as provided by the use of a minimum number of logic gates to implement the switch circuit and the control circuit;
4. It requires a small number of wires, as provided by the use of a minimum number of logic gates to implement the switch circuit and the control circuit and by the arrangement of the switch circuit, the cache memory, and the control circuit; and
5. It requires only two layers of metal to implement, as provided by the arrangement of the switch circuit, the cache memory, and the control circuit.
Since the switch circuit requires no more silicon area than a barrel shifter and since the switch circuit can perform the functions of a barrel shifter, the switch circuit can be used in place of a barrel shifter in a circuit design, thus saving space in the circuit design.
Also, the control circuit generates a relatively large number of bits (e.g. 1024 bits), which comprise a pattern of bits. This pattern of bits is often repetitive. Taking advantage of the repetitiveness of this pattern of bits, the control circuit can use a relatively small number of bits (e.g. 64 bits) to generate a relatively large number of bits (e.g. 1024 bits). In an exemplary embodiment, the control circuit generates n*2m bits (e.g. 1024 bits) as a pattern of bits by using only 2n-2 input bits (e.g. 64 input bits).