1. Field of the Invention
This invention relates generally to communication methods and topologies. More specifically the invention relates to a circuit and protocol useful in propagating information through a mesh topology in a reliable manner for real-time applications.
2. Description of the Related Art
In order to communicate electronic information from one point to another, that information must be passed from the transmitting point to the receiving point along some connecting medium. In a telegraph, for example, an electric circuit is opened and closed in a predictable and understandable manner so that signals generated on one end of the line are received and understood at the other. This works extremely well when there is only one transmitting point and one receiving point. Problems begin to occur if both points are capable of transmitting along the same wire. If both points happen to generate signals at both ends of the wire, indiscernible noise is produced and neither side can understand the other""s message. Consequently, a probability problem exists in that the system will work fine when only one end is transmitting and will collapse when both sides happen to transmit at the same time.
To increase the usefulness of the system, it is helpful to have multiple stations along an interconnected single line, spanning some great distance. In this configuration, there would be many stations having access to the communication network. Despite the obvious benefit of multiple stations accessing the network, the probability of any two stations transmitting at the same time is greatly increased. As discussed above, this creates the risk of any particular message not being received.
While telegraphs certainly are not relied on to communicate information today, many modern electronic devices suffer from the same problems as described above. For instance, different components in a personal computer must communicate with each other as well as a CPU. On a larger scale, a plurality of computers may wish to communicate with each other on an intranet or even the Internet. The problem is the same in each instance; how can interconnected electronic components communicate over a shared medium?
Many types of solutions have been used in the past to solve this communication problem. FIG. 2 illustrates one previous solution well known as a xe2x80x9cbusxe2x80x9d topography. Each Node N in network 100 is connected to the bus 10. When any Node N sends a message, its is rapidly received by all of the other nodes N by traveling along bus 10. Whichever Node N the message was intended for will likewise receive the message and process the information. There are three distinct problems with this type of topology. First, as in the telegraph example, multiple nodes N may wish to transmit at the same time. Second, if the link between any pair of nodes N is severed, the entire system is at least severely impaired and possibly totally disabled. Third, in real world applications the configuration of the nodes N is not likely to be positioned linearly as schematically illustrated.
To deal with the first problem, a media access control (MAC) protocol is needed which allows only one node to be transmit and at given time. One such protocol is known as Table Driven Proportional Access (TDPA), as shown in FIG. 11. With this protocol, each Node N has an identical table 20. The tables 20 will designate which particular Node N will be able to transmit at any given time. In the example presented, there are four nodes, numbered 1-4. Each Node N has a node indication pointer 30 which steps through the table sequentially and indicates to all of the nodes N concurrently, which Node N is designated to transmit. In FIG. 11, the pointer 30 indicates that node 3 is able to transmit. At this point, all other nodes N will xe2x80x9clistenxe2x80x9d for a message which may or may not be transmitted by node 3. After the message has been transmitted or after the time that would have been taken to transmit a message if no message is transmitted, the pointer 30 advances to the next index identifier which happens to indicate that node 1 will be free to transmit. The pointer advances through the table, and when the end is reached, it is reset to the beginning. In this manner, each Node N knows when to xe2x80x9cspeakxe2x80x9d and when to xe2x80x9clistenxe2x80x9d, thus avoiding the problem of two nodes N simultaneously attempting to transmit at the same time.
TDPA is a commercially accepted and known protocol, however there are a wide variety of other protocols which may be used with a standard bus. In some of those protocols, such as CSMA, multiple nodes N could transmit at the same time. When this occurs, the two signals will xe2x80x9ccrashxe2x80x9d into one another. The system will then recognize this collision and each Node N will attempt to resend its respective message. To avoid colliding again, the two nodes independently select random delay times (which are highly likely to be different from one another) and wait for that period of time before resending. Probability suggests that the two messages will most likely eventually each be sent, though subsequent collisions are possible (thus causing each node to again select a random delay time and restart the process). There is a finite probability, however, that repeated collisions could continue to occur, prohibiting the transmission of the data. When using this protocol, the collision could occur at any given point along the bus. Thus, one or more nodes N may have received one of the messages prior to the collision and would therefore not recognize that there was a collision. Ultimately, when the message is resent, those nodes N would interpret the message as a new one, not a repeat of the old. To avoid this and to prevent collision xe2x80x9cdebrisxe2x80x9d from being misinterpreted as a valid message, when a collision occurs, each node detecting the collision immediately transmits a jam signal to the remaining nodes to ensure they detect that a collision has occurred. This protocol has a probability (but not certainty) that it will eventually get information to its proper destination, but it is inherently slow and easy to bog down.
The second major problem with the use of a data bus occurs when a link 150 between a pair of nodes N is severed or a particular node malfunctions (emitting spurious information). In this condition, the entire system is impaired. This malfunction could occur in either the connection between Node N and the bus or along the bus between the individual nodes N. FIG. 5 shows a bus 10 having four nodes N1-N4. As illustrated by the X through the bus 10, the link between N3 and N4 has been severed. This could completely shut down the system. Signal reflections from the severed ends can cause even the intact connections between nodes to not function correctly.
To prevent a catastrophic failure caused by the severing of a connection, redundant bus line 10xe2x80x2 may be added. In summary, 100% of the existing bus line is duplicated to achieve one level of redundancy (can survive one detected failure). FIG. 8 shows how three buses may be used to achieve two levels of redundancy. Obviously, this method of protection requires an excessive amount of cabling, thus increasing the cost and complexity of the system.
Returning to FIG. 5, a second potential problem is depicted in which there is a problem with the node itself (see N2 crossed out). The node may be generating random or spurious signals thus producing noise on both the bus 10 and 10xe2x80x2. Such a malfunctioning node can also cause a change in the impedance of the connective media. When this occurs, the node is known as a babbling node. Thus, no matter the level of redundancy achieved, a single babbling node could shut down the entire system.
The third major problem with the use of a data bus is the physical parameters of the interconnecting cable. As shown in FIG. 13, the various nodes N are seldom linearly located, hence interconnecting cables, or links 150, of different lengths must be utilized. Due to the nature of the propagation of signals, a maximum length of interconnecting cable cannot be exceeded, thus limiting the physical configuration of the data bus.
Another commonly used topology is the ring, shown in FIG. 3. Here, the network 100 will form a serially connected closed loop of nodes N. The ring will suffer many of the same problems as the data bus described above. FIGS. 6 and 9 show the additional cabling required to create single and double redundancy. Similarly, FIG. 14 shows a ring which is configured asymmetrically. As can be seen, this configuration requires many cables of differing lengths.
Yet another known topography is a mesh network 100 as shown in FIGS. 1 and 7. In a mesh, each Node N is linked to a plurality of other nodes N in a grid-like manner. Therefore, there are a plurality of paths between a given pair of nodes N. It is important to remember that the xe2x80x9cgridxe2x80x9d may be configured in virtually any pattern or arrangement. That is, it need not be symmetrical or otherwise systemically patterned.
The great advantage to using a mesh is the inherent reliability of the structure. A mesh is the only inherently fault-tolerant topology. Turning to FIG. 4, if any particular link 150 is severed, an alternate route is still available. In large networks, this is a tremendous cost savings in that superior reliability is achieved while using less cable than in other topologies.
There are two commonly known networks that use a mesh topology. Namely, the worldwide telephone system and the Internet. When a telephone number is dialed, the first few digits dialed allow a connection to a nearby node; the next few allow connection to a more distant node and so on until the connection is established to the recipient. Once the connection is established, that single pathway is maintained for the duration of the call. All information is transmitted over that single path. No information is transferred until that path is fully established. Referring to FIG. 10 if Node S is the caller and Node D is the recipient, the connection is again shown by the solid arrows. This protocol is called circuit switching.
The Internet works on a slightly different principal. The Internet protocol causes the entire message (packet) to be transmitted to a neighboring node before an end-to-end path from the source to the destination is established. Information in the header of the message will define the ultimate destination and is used by each node seeing the message to route the message along a path which will lead it to the destination. Referring the FIG. 10, Node 10 transmits an entire message to one of its connected nodes. The connected node waits until the entire message is received and then retransmit the entire message. As described above, this will eventually reach Node D, and the message will be properly received. This protocol is called packet switching.
One problem with this xe2x80x9cInternetxe2x80x9d protocol is that each Node N is only capable of holding a finite amount of information at any given time. Therefore, if Node S transmits to a connected Node N but that connected node has insufficient remaining memory to store the message (the memory may be full with other messages being transmitted simultaneously), the message is lost or must wait. The mesh arrangement will allow the information to eventually reach a distant node, such as Node D. However, if the intended recipient or if any of the nodes in the path to the destination were memory deficient, the message would never be received.
Another problem with the Internet protocol is that it is rather slow in propagating the message from the source to the destination. Since the entire message must be transmitted as a whole each time it passes from a one node to other node before the receiving node can relay it along the path to the destination, the result is a relatively slow transmission. This makes such an arrangement undesirable for many real-time uses. Furthermore, more delay is added by the memory deficiency problems addressed above.
Turning to FIG. 10, a transmission protocol for a meshed network 100 will be described. Propagation of a signal is achieved by flooding the network 100. That is, a signal is generated by a source and transmitted to all directly connected nodes. Each receiving node then retransmits the message. The nodes N receiving that signal again retransmit the signal until eventually every node on the network has received copy of the transmission. In FIG. 4, Node S represents the source of the transmission and Node D is the intended destination. Node S transmits a signal to its four connected nodes (three of which are shown). Each of those nodes N then transmits until eventually Node D is reached. Now assume that the path shown by the solid arrows is the first path by which data reaches Node D. Once achieved, this becomes the selected path and the entire message is transmitted along the path shown by the solid arrows from Node S to Node D. Therefore, this flooding protocol is only used to determine a path and once so established, the redundant transmissions by the other nodes N are ignored. The next time a source wishes to transmit, this flooding protocol is again performed to establish a path. That way, if a link 150 has been severed between transmissions, a working connective path can still be established. It is worthwhile to note that the path shown by the solid arrows is only one of many which could occur. Using this mesh configuration, multiple links could be severed while still allowing 100% use of the network.
Another problem with the use of a mesh topology is the likelihood of any particular Node N receiving messages from two different nodes N at exactly the same time. The Node N is usually designated to select the message it receives first and to process that message. However, there will always be cases where messages arrive so close together that it is too close to tell which arrived first. In these cases, the electronics used to detect the message order can become metastable. When this occurs, the Node N will malfunction; basically oscillating back and forth between the two possible inputs without producing a useable output or selecting neither message, in which case the messages again are lost.
There are many applications where a properly working mesh would be advantageous, but they are not being used because the current mesh protocols are inadequate. For instance, real-time control systems on aircraft and large-scale vehicles, such as buses, require fast, reliable and accurate data transmission. The mesh topology provides the desired reliability but the existing protocols do not have the real-time properties needed for these applications. Therefore, there exists a need to have a highly reliable mesh network capable of rapid transmission to facilitate real-time applications.
The present invention relates to the propagation of data signals over a meshed network of nodes. A mesh is simply a plurality of nodes, connected to one another along a plurality of different paths. That way, if any particular link is severed or damaged, alternate routes are available for the data to travel.
Current mesh protocols are not feasible for use in systems that require reliable, real-time data transfer. The present invention modifies the concept of flooding to provide a protocol which is both fast and reliable.
In the present invention, a single node is allowed to transmit at any given time over a given region (subset) of the mesh. The region can be, and often is, the entirety of the mesh. To control the times when any particular node will transmit, the nodes utilize a variant of the flooding mechanism in conjunction with any media access protocol (MAC) which is applicable to the bus topology. An example of such a MAC is the TDPA protocol in which the nodes each have a corresponding table that indicates to all of the nodes when and which node may transmit. The table is time based. Therefore, each node has a particular amount of time within which to transmit. After that time has expired, the tables indicate that a different node is now capable of transmitting.
Once designated, the node begins to transmit a message bit by bit. The first bit is sent out on all links of the node and is received by other nodes so connected. Immediately upon receipt of the first data bit, the receiving nodes retransmit that bit to all of the links to which it is connected. In this manner, the entire network will soon be flooded by the first data bit.
Immediately thereafter, the transmitting node will send the second bit, then the third and so on, until the entire message is sent or its time limit expires. In this fashion, the entire message floods the network, bit by bit.
For very high speed communication links, it may be impractical to retransmit the messages strictly bit by bit. In these cases, a small number of bits can be used as the unit of data to be accumulated before being retransmitted. For example, with links that use 4B5B encoding, units of 5 bits can be used. However, it is desirable to use the smallest number of bits practical given that characteristics of the communication links.
In order to accomplish this flooding, the receiving nodes must have a mechanism to lock out the links that have not received data. The nodes must also have a mechanism by which they can distinguish between two messages that arrive on different links at or about the same time. Without such mechanisms, the message could be received for a second time by a node. In effect, this would cause the message to be sent to nodes that have already passed the message along. If this occurred, a single message could propagate through the network indefinitely. Furthermore, if two messages arrive at or about the same time, it is possible to cause current node electronics to become metastable, thereby causing the message to be corrupted or lost.
The present invention employs a novel class of arbitration and lockout circuits to solve these problems. Once the first bit of a message is received, the circuitry locks the node into a particular configuration for the remainder of the message. In that configuration, only the originally receiving link will be allowed to affect the node""s outputs. In this manner, if data is sent back to an earlier node, it will be ignored. The circuitry also has an arbitration function. This allows the link which first receives the signal to control the outputs. However, if two links receive the message at exactly the same time, they are both allowed to affect the output. Since, MAC protocols for a bus topology allow only one message to be sent at any one time, simultaneously arriving messages would mean the both links are receiving the exact same data at the exact same time. The two messages are just copies of the same message, which have taken different routes through the mesh during its flood. Therefore, even if two links affect the output, the net result is the same because of the circuit""s ability to combine the signals. Finally, if two signals arrive very close together, but not exactly simultaneously, the first signal is allowed to pass and becomes the output. As in any time-of-arrival arbitration circuit, it is possible for inputs to have specific arrival timing such that one or more flip-flops in the arbitration circuit become metastable. Using the correct class of arbitration circuitry allows for this portion of the node to become metastable while still allowing it to correctly output data from the node. The needed characteristic for an arbitration circuit to be useable for this invention is that it must allow at least one of the arbitrating inputs to affect the outputs at all times, even during periods of metastabiliy. Metastability is allowed to cause more than one input to affect the output, either simultaneously and/or sequentially.
Some bus topology MAC protocols require time synchronization. The TDPA protocol is an example. The retransmission delay as a message passes through nodes in its path from a source to a destination can accumulate to the point where it can degrade the synchronization. A compensation for the accumulated delay can be created knowing how many, or even specifically which, nodes the message traversed. A destination could use a table of values to compensate for this accumulated delay, given a priori knowledge of the mesh""s specific topology, link and node delays; and the current mesh fault status and identification of the source. Alternatively, delay information can be added as a field in the messages. As a message traverses each node, the delay field can be updated xe2x80x9con the flyxe2x80x9d with the node adding in its part of the delay and the delay of the incoming link. If the delay field is sent least significant bit first, a serial adder can be used which does not cause any additional message delay. If all nodes have the nearly the same input-to-output delay and all links have nearly the same propagation delay, the message""s delay field is just a xe2x80x9chop countxe2x80x9d of the number of links and nodes through which the message passed.
Another aspect of the present invention is the concurrent propagation of multiple messages over a single meshed network. As discussed above, bus topology MAC will only allow a single node to transmit at any given time. However, by dividing the mesh into a plurality of regions or submeshes, a single message could propagate over each submesh, thus increasing the effective bandwidth of the entire system.
Each node, in a preferred embodiment, has four links and a local connection to some resource such as a data processor or an input or output device. Therefore, each node is capable of handling two different messages simultaneously. Within a defined submesh itself, messages propagate by flooding just as in a single unified mesh. However, along a border dividing the various submeshes, the nodes will behave differently to contain the separate floods. That is, a message generated in a given submesh must remain in that submesh and not be allowed to enter a different submesh. The nodes along the border will generally have two links in one submesh, and two links in a different submesh. A protocol is established which temporarily connects the two links within a submesh while not having a connection between the links in different submeshes. Assuming the node has links numbered 1, 2, 3 and 4, a first message will only be allowed to be input and output over links 1 and 2, while a second message will only be allowed to be input and output of links 3 and 4. The particular connectivity of the links may be static or varied as often as each message. Variable connectivity information can be stored in tables that define the specific connectivity periods of time. These tables can be adjuncts to a TDPA table, or associated with time slots in a time division multiple access (TDMA), or the connectivity can be switched using various other protocols.
To further facilitate the integrity of the network, self checking pairs may be employed. That is, every original node is replaced with a pair of nodes and a pair of cables (rather than just one) form each link. Each message sent across a link is replicated in the two cables, one replicant coming from one of the nodes in the node pair transmitting into that link. Each pair of nodes compares the data it receives along the two cables of a link. If the two are identical, the node pair determines the message to be accurate. If, however, there is a difference in the message replicants, the node pair determines that the message has become corrupted, and disregards the message from that link.
It is an object of the present invention to provide a protocol for a meshed network which allows for reliable real-time use of a system.
It is another object of the present invention to provide a meshed network having nodes capable of arbitrating between concurrent messages and also to remain unaffected when various electronic components become metastable.
It is yet another object of the present invention to provide a meshed network with the ability to send multiple messages along various submeshes simultaneously.
It is yet still another object of the present invention to provide a minimal data signal to trigger the linking connectivity protocol in the various links of a node lying on the border of a submesh.
It is a further object of the present invention to provide a synchronizing component to a message stream in order to correlate the tables and time based pointers within each node of a network.
It is still another object of the present invention to provide a meshed network having self checking pairs, by which the integrity of the data transmitted may be verified.
It is yet still another object of the present invention to provide a mechanism by which newly added or rejoined nodes to a mesh can self synchronize with the remainder of the nodes.