1. Field of the Invention
The present invention relates to the design of digital circuits. More specifically, the present relates to a method and an apparatus for asynchronously routing data within a circuit between multiple sources and multiple destinations.
2. Related Art
It is often necessary in computing and communication equipment to send data from many sources to many destinations. This need appears in the central processing unit of computer systems where information may flow: from a register file to any one of a number of arithmetic or logical elements or to a memory controller; from one arithmetic element to another; or from an arithmetic element or memory controller to the register file. This need also appears in the input-output systems of computers where information must flow between and among various units including processors, memories and secondary storage devices.
One common means for providing this service is known as a data bus. A bus consists of a number of wires that extend between all communicating units; there is generally one, but sometimes there are two or more wires per bit of information to be sent at any one time. Each unit that wishes to send data places its value on the data bus so that any of the receiving units may receive it. Such bus structures have been widely used both inside central computing elements and in the input-output systems for computers.
There are a number of drawbacks to such a bus structure. First, each destination must attach some transistors to the bus in order to sense its state, and because there are many destinations, these sensing transistors collectively represent a large electrical load. Second, each source must attach driving transistors to the bus to use when that source is to provide data for the bus, and even though all but one such drive transistor per bus wire is shut off when the bus changes state, the many inactive drive transistors connected to the bus also place considerable electrical load on the wires in the bus. Third, the bus wires themselves tend to be physically long and thus intrinsically represent further electrical load. The combined load on the bus wires from drivers, receivers and the wires themselves results in communication paths that are generally slow in comparison with other logical structures. Furthermore, only a single piece of information can flow per bus cycle, which limits the achievable communication rate.
One alternative to bus structure is the cross-bar switch. For each bit of communication, a cross-bar switch provides a grid of conductors that may be thought of as xe2x80x9chorizontalxe2x80x9d and xe2x80x9cvertical,xe2x80x9d wherein each source drives a horizontal conductor and each destination senses the state of a vertical conductor. At each intersection of the conductors in the cross-bar, a transistor or other switching element can connect the horizontal and vertical wires that meet there. This grid structure is repeated for as many bits as are to be transmitted at any one time.
The cross-bar switch has several advantages over the bus structure. First, each source drives only the capacitive load on the horizontal wire, which amounts to one receiving switch mechanism per destination. The many drivers that would have to be connected to each wire in a bus structure are here replaced by a single driver on the source wire. Because this driver drives only the source wire and its switches, it can be as large as desired, and can thus drive its load very quickly. Moreover, the wire for each destination has a load of only one sensing transistor, though it may be connected to many inactive intersection switches. Thus, the cross-bar switch divides the inherent loading in a simple bus into two parts, the horizontal wire pathway, and the vertical wire pathway, thereby speeding up the flow of information.
A further advantage of the cross-bar switch is that it can deliver several pieces of information concurrently. Several different sources can each deliver information to several different destinations at the same time provided no two sources and no two destinations are the same, because each such communication uses a different switch to connect its horizontal source wire to its vertical destination wire. That is, two or more switches may be active at any one time provided that no two switches in the same row or in the same column are active.
The disadvantage of the cross-bar switch lies in its large number of switching transistors. While each bit of the bus structure has only one drive element per source and one receiving element per destination, the number of switch points in a cross-bar switch is the product of the number of sources and the number of destinations. Not only do these many switch points require chip area and consume power, but also they require control information. The difficulty of controlling so many switches turns out to be a disadvantage in implementation.
A second alternative to the bus structure is to use point-to-point wiring between each source and each destination. Point-to-point wiring is returning to more common use in modern systems because it simplifies the electrical properties of the transmission lines used. In a point-to-point system, each destination must be prepared to receive signals along transmission lines that begin at each source, so that the number of receivers at each destination equals the number of sources. Similarly, each source must be able to send information to each destination. Thus, the number of sending and receiving mechanisms required is the same as the number of switch points in the cross-bar switch. The point-to-point mechanism is merely a physical rearrangement of the cross-bar switches, wherein the horizontal and vertical wires in the cross-bar have become very short, and each switch at an intersection is replaced by a transmission line running from one source to one destination.
The point-to-point mechanism can be very fast. However, like the cross-bar it suffers from the need for a great deal of control information. Moreover, it is generally hard to find space for the large number of transmission lines required.
A third alternative to simple busses is to use some kind of network interconnection scheme. The Ethernet for example, is essentially a bus structure that uses itself for control, and transmits data serially. Other networks, including those with complex computer-controlled switches are well-known and widely used. Such switches appear, for example, in the Internet. Generally, however, their control is very complex and their throughput is much less than that of an equivalent bus structure.
The present invention provides high throughput through a tree-structured multiplexing-and-amplifying system. Because the stray capacitance of any wire in commonly used circuitry (such as CMOS) can store data, it is possible to store many values in a multiplexer tree structure and additional values in an amplification tree structure. The present invention uses this storage to permit several communications to proceed concurrently in different parts of the structure. A new communication can be launched as soon as the wires it requires are no longer needed for the previous communication.
Instead of using a single-level bus structure, one embodiment of the present invention uses a multiple-level structure. Consider, for example, a single-level bus structure for 64 sources and 32 destinations. Each of the 64 sources must have suitable drive transistors that can put data onto the bus. Thus, the drive structure to the bus is, in effect, a multiplexer with 64 inputs. Similarly, each of the 32 destinations must have a sensing transistor connected to the bus so that any of them can accept data values from the bus. Thus, the output structure is, in effect, a 32-way fan-out from the bus to the 32 destinations.
In CMOS technology, multiplexers with many inputs can be broken into tree structures of multiplexers with fewer inputs. Although such tree structures of multiplexers contain more levels of logic than a single multiplexer, they can nevertheless be faster because each level of logic is simpler. In fact, in the book xe2x80x9cTheory of Logical Effort,xe2x80x9d by Ivan Sutherland, Bob Sproul and David Harris, Morgan Kaufmann Publishers, Inc., San Francisco, Calif., 1999, chapter 11.4.1 teaches that in CMOS circuits the fastest multiplexing structure is a tree in which each level joins approximately four inputs. Thus, the 64 input multiplexer of our example might better be replaced with a three-level tree. The first level gathers groups of four sources together onto several short xe2x80x9clevel-1xe2x80x9d busses; in our example there would be 64/4=16 such level-1 busses. The second level of 4-input multiplexers gathers together groups of four such level-1 busses into somewhat longer xe2x80x9clevel-2xe2x80x9d busses; our example requires 16/4=4 such level-2 busses. Finally, a third level of 4-input multiplexers gathers these level-2 busses together into a single xe2x80x9clevel-3xe2x80x9d bus, which need be only long enough to reach all of the inputs from the nearest part of the level-2 busses.
Furthermore, a series of amplifiers can be used to deliver a particular signal to many destinations. Such a set of amplifiers can easily be arranged into a tree structure, much like the multiplexer tree but in reverse. In our example of 32 destinations, the information on the level-3 bus might be amplified and sent to two level-4 busses. Four amplifiers on each such level-4 bus might amplify the signal again, delivering it to a total of eight level-5 busses. Again, four amplifiers on each level-5 bus might be used to amplify the signal, each delivering its output to four destinations. In spite of the fact that more stages of amplification are involved, such structures are faster than a single stage of amplification can be.
These multi-level structures have an advantage of speed, but they require extra wires to accommodate the different bus levels. Thus, the design of such a structure is always a compromise between the desired speed and the space cost of extra wiring.
A further point must be made here: it requires energy to change the value on any wire in a CMOS system. Thus, if we deliver information to all destinations always, we will consume more power than would be required to deliver the same information only to its intended destination, leaving static the state of wires that do not participate in that particular communication. As we shall shortly see, the present invention takes advantage of this potential saving in power.
Returning to our example of 64 sources, at the same time that the level-2 bus delivers information to the level-3 bus, a new source can deliver information to the level-1 bus provided the new information is kept from overwriting the previous command data. By overlapping in time the actions of different levels the structure can achieve higher data throughput rates. In fact, the throughput of such a structure is limited mainly by its ability to turn the multiplexers on and off quickly enough.
Furthermore, consecutive communications from the same source to the same destination can overlap in time. For example, as soon as the first has cleared the level-1 bus, the second may use that bus. Naturally, a small time gap between communications is required; in the limit, however, there may be as many communications underway as there are levels in the tree-structures.
Similarly, one can store information in the structure that amplifies and delivers data from the main bus to the destinations. Such an amplification structure consists of several levels of amplification, each fanning out to a next set of amplifiers and finally to the destinations themselves. Each such level can also serve as a place to store information. Thus, for example, one can overlap in time the delivery of a data item from the level-3 bus to the first level of amplification, the level-4 bus, while delivering the previously transmitted data item from the level-6 bus to its final destination.
A further advantage of the present invention is that it can operate asynchronously in time. For example, a data element launched from a particular source to a particular destination can flow along a certain path through the multiplexing structure, through the highest level busxe2x80x94also known as the xe2x80x9ctrunkxe2x80x9dxe2x80x94and thence through the amplifying structure to its destination. While it is in flight, some other data element launched from a different source and at an unrelated time may take its own route to its own particular destination. Two such communications will not interfere with each other except where they require a common communication path. The present invention permits each to proceed as far as it can without interfering with others, dealing with such potential interference by controlling only the sequence in which the conflicting communication actions may use the common path.
Yet a further aspect of the present invention involves automatically stalling the communication mechanism when a source is not ready to provide information or a destination is not ready to receive it. Because the interconnection structure contains storage at every level, actions already underway may proceed without waiting for a stalled source or destination irrelevant to their action. Delay in one source need not retard the communications emanating from a different source, nor need delay in accepting previous data at a destination retard delivery to other destinations, except, of course, as such other communications require the use of pathways common to the stalled communication.
Naturally, the control of such a switching structure with internal storage presents its own set of challenges. One part of the invention described herein involves a simple set of control structures which, also configured hierarchically, asynchronously control the concurrent flow of data through the switching structure from source to destination. The xe2x80x9cswitching directivexe2x80x9d for each communication action includes a xe2x80x9csource address,xe2x80x9d indicating the particular source for this communication and a xe2x80x9cdestination address,xe2x80x9d indicating the particular destination that is to receive this data item. A stream of such address pairs thus controls the dynamic operation of the data switching network of the present invention.
One embodiment of the present invention provides a system that facilitates asynchronously routing data within a circuit. This system includes a data destination horn, for routing data from a trunk line to a plurality of destinations. This data destination horn includes a plurality of one-to-many switching elements organized into a tree of at least one level that fans out from the trunk line to the plurality of destinations. It also includes a plurality of memory elements for storing data in transit between the plurality of one-to-many switching elements. An asynchronous control structure is coupled to the data destination horn, and is configured to control the propagation of data through the data destination horn, so that when a given data item appears at an input of a memory element, the given data item is asynchronously latched into the memory element as soon space becomes available in the memory element without having to wait for a clock signal.
One embodiment of the present invention additionally includes a data source funnel, for routing data from a plurality of sources into the trunk line. This data source funnel includes a plurality of many-to-one switching elements organized into a tree of at least one level that fans in from the plurality of sources to into the trunk line. It also includes a plurality of funnel memory elements for storing data in transit between the plurality of many-to-one switching elements. Moreover, the asynchronous control structure is additionally configured to control propagation of data through the data source funnel, so that when a given data item appears at an input of a funnel memory element, the given data item is asynchronously latched into the funnel memory element as soon space becomes available in the funnel memory element without having to wait for a clock signal.
In one embodiment of the present invention, the asynchronous control structure includes a control destination horn, including a plurality of control memory elements coupled to control inputs of the plurality of one-to-many switching elements, that contain control information to control the plurality of one-to-many switching elements. This control destination horn includes a plurality of one-to-many control switching elements organized into a tree structure that mirrors the structure of the data destination horn, thereby allowing the control information to follow associated data through the data destination horn.
In one embodiment of the present invention, the asynchronous control structure includes a control source funnel, including a plurality of source control memory elements coupled to control inputs of the plurality of many-to-one switching elements, that contain control information to control the many-to-one switching elements. This control source funnel includes a plurality of many-to-one control switching elements organized into a tree structure that mirrors the structure of the data source funnel, thereby allowing the control information to follow associated data through the data source funnel.
In one embodiment of the present invention, the tree within the data destination horn is a balanced tree.
In one embodiment of the present invention, the tree within the data destination horn is an unbalanced tree.
In one embodiment of the present invention, the trunk line and the data destination horn form a first switching module for routing data from the plurality of sources to the plurality of destinations.
In one embodiment of the present invention, the system additionally includes a second switching module coupled in series with the first switching module, so that outputs of the first switching module feed into inputs of the second switching module.
In one embodiment of the present invention, the data source funnel, the trunk line and the data destination horn form a first switching module for routing data from the plurality of sources to the plurality of destinations. In this embodiment, the system further comprises a third switching module coupled in parallel with the first switching module so that each of the plurality of sources can route data to each of the plurality of destinations through either the first switching module or the third switching module.
In one embodiment of the present invention, the order in which data elements pass through the trunk line is pre-determined by the control information within the asynchronous control structure.
In one embodiment of the present invention, the order in which data elements pass through the trunk line is determined by demand for delivery of data from the plurality of sources.
In one embodiment of the present invention, a least one of the plurality of memory elements is a state conductor that carries a voltage that indicates a state of the circuit. A variation on this embodiment additionally includes a keeper circuit coupled to the state conductor that is configured to hold the voltage on the state conductor at a stable value, unless the voltage is changed by a drive circuit.