The present invention relates to digital communications networks and, more particularly, to a self-reconfiguring digital mesh communications network communicating between user devices (such as servers or other computers) and storage or other subordinate devices. Conventional computer networks or systems are generally hierarchically connected and communicate through a HUB or switch architecture. A drawback of HUB or switch architectures is that they suffer bandwidth degradation due to the back plane speed of the HUB or switch.
In IP networks routers provide, inter alia, the functionality of forwarding packets. More often than not, they are responsible for forwarding the packets to the next hop router only. This, they do independent of other routers. The network topology information is exchanged between them, which facilitates the construction of routing tables. Tables in each router specify the next hop to reach a particular destination.
The way in which the elements of a network are connected is called network topology. All routers have to reconstruct their routing tables for forwarding to work correctly if there are any changes in the network topology. Routing tables are constructed taking the network topology as the input. The shortest paths to all destinations are calculated and the next hop for each destination is stored in the routing tables. This requires that all routers be apprised of every change in the topology. Both link failure and re-establishment are such changes in the network topology. If a link goes down, the routers connected to that link stop using that link to forward packets. Other routers, which are not aware that the link is not functional, could continue sending packets to the routers connected to the failed link. Hence, every router in the network has to be informed about the link having gone down. A link state advertisement (LSA) to every router in the network advises of the new topology. This is called “flooding.” The period during which this information exchange is taking place from the time the link failed until the time when all routers have rebuilt their routing tables is called convergence period. During this period there may be transient routing loops. Also, its portion of the conversion period each router ceases communication. Packets are buffered by each router until it perceives that the convergence period is over. For efficient performance this interval must be as short as possible. As the network size increases, the frequency of link state changes increases, and the convergence period increases since more routers have to be informed and all of them need to do the recalculation of their routing table. In terms of bandwidth and processing cost, the overhead associated with flooding is greatly increased. Also, very frequent changes in the link state degrade the overall network performance and sometimes make it impossible for routing to take place correctly because of instability and route flapping—the condition in which numerous routes are withdrawn and re-advertised in rapid succession. Still, a quick response to link failures, lower convergence periods and less control traffic are desirable for good networks and to enable providing Quality of Service (QoS).
In today's Internet, one routing technology, dominated by link state routing protocols, is Open Shortest Path First (OSPF). It is based on the Shortest Path First algorithm known as Dijkstra's algorithm. The protocol calls for sending the LSA packets to all routers in the same hierarchical area. The LSA contains information about a router's attached interfaces, their metrics and other variables like the other end of the interfaces etc. All routers collect these LSAs and use them to form a table called the Link State Database. This link state database is the input to the shortest path algorithm, which calculates the shortest path to all other nodes in the network. With this calculation, the routing tables are generated and are used for routing packets in the network.
There are several advantages of link state protocols over distance vector protocols. However, the deficiencies include slow response to topological changes, large update packets that interfere with data traffic, and a tendency to form loops, which could sometimes persist for seconds or even minutes.
A topology in which there are at least two nodes with two or more paths between them is called mesh topology. In a mesh topology, the nodes of the network are connected by point-to-point links. FIG. 3 is an illustration of a mesh network with multiple links (A, B), (A, C), (A, D), (B, E), etc. connecting the multiple nodes A, B, C, D, E, etc. A ring network as illustrated in FIG. 10 is a special case of a mesh network. In the networks described here the nodes are routers. The routers may be communication devices dedicated to the routing and rerouting of communications, or they may be other devices having this functionality, i.e. PCs appropriately programmed.
Many companies are developing products to create wireless mesh broadband networks. The Nokia Rooftop Wireless routing mesh network solution [10] is one such product. Products like this enable delivery of broadband services to residential and small business customers. They do not depend on the existing copper lines or fiber optics. The intention is to provide coverage to an entire neighborhood which could be laden with obstacles like trees, poles, hills etc.
Each node in such a network is both subscriber equipment and a router. The Nokia product uses a proprietary operating system called the Nokia AIR OS. It also includes a specialized set of networking protocols for adaptive multi-hop wireless networks. The network planning has to be done in such a manner that multiple paths exist between nodes in the network. The routing protocols have to be made robust enough to handle link failures and switch to alternate paths whenever required. Added to this, the paths chosen by a router at any point in time must be the best available paths. The objective is to achieve a self-forming self-healing scalable network. The routers in the network have to make smart routing decisions, routing around obstacles and taking care of link failures to provide seamless service.
In the case of wireless networks operating on link state routing protocols, another serious problem is encountered. Because the wireless links are fragile, the use of link state protocols in their current form give a very poor performance. For this reason, wireless networks are not often used to communicate among computers and storage facilities. The frequent link failure requires too much control traffic. This degrades performance. For example, where a passing object in the line of sight will be mistaken for a link failure it will trigger multiple link state updates in an area in which the object is moving. The coming up of the link will also have to be advertised by flooding. The amount of traffic generated by this flooding can become tremendous and severely degrade the network performance. The principle behind link state protocols is that for routing to work correctly, all routers in the network must have identical topological information at all times. Even if there is a single change in the link state of the network, the information is propagated immediately to the entire network. This is because of the distributed replicated database model [1] followed. In sum, then, flooding the entire area because of a single change can be expensive in terms of time, bandwidth, cost and computational overhead and the problem gets worse when the links go up and down rapidly. Also routing instability that comes about in the form of routing loops can become unacceptable.
In the specific case of wireless links, there is an even more serious problem. Every router has a parameter called “MinLSInterval” [1] [4] [11] [12]. The value of this parameter sets the interval, in seconds, between distinct originations of any particular link state advertisement packet. If the value is set to 5 seconds, a router cannot send two consecutive updates about the same interface more frequently than 5 seconds apart. If a link goes down and the router has advertised the failure, it cannot advertise the coming up of the link for the next 5 seconds even if the link comes back up within a few milliseconds. Therefore the link is rendered unusable for some time even if the link is up. The reverse of this occurrence is worse. If a link is advertised to be up, but goes down in a few milliseconds from the advertisement and does not come back up for some time, then routing loops can occur. This is because all the routers in the network do their calculations based on the fact that the link is up, which is not true. All this coupled with the standard problem of huge amounts of flooding traffic makes a serious problem.
One approach to this problem that is followed is that changes in a link's metric are ignored and the information is not propagated [6]. Routing takes place along old routes in a sub-optimal manner. Once every few seconds, the router can send a summary of all the changes that have taken place. This reduces the amount of control traffic in the network, but this approach fails when the change in the link state is a link failure. This information cannot be withheld because it will cause routing loops and packet loss. This requires that the information about link failures must absolutely be flooded for protocol correctness.
Another related issue is link failure detection. There are two approaches possible. One is the routing protocol itself monitors the link and detects link failures. The other is to leave the lower layers (like the data link layer) do the job and inform the IP layer of the link state changes. The latter method is better because it leads to quick detection compared to the former. When OSPF is used, a “hello protocol” [1] [4] [11] [12] is used to determine the link's state. The protocol ensures that communication between two neighbor routers is bi-directional. Hello packets are sent out periodically on all router interfaces. When a router sees itself listed in its neighbor's hello packet, bi-directional communication is ensured. A parameter called the “RouterDeadInterval” [1] [4] [11] [12] with a typical value of 40 seconds is configured for every interface. When a hello packet is not received on an interface for four consecutive “HelloIntervals” (typically 10 seconds each), the router declares the link to be down. After the first interval elapses without the reception of a hello packet on a link, a timer triggers for that link to keep track until four such intervals elapse. During this time, the router has not yet declared the link down, so it assumes that the link is fine and uses the link to route packets. If the link is actually down, then the packets, which are transmitted on that link during this time, are lost. They have to be retransmitted by higher layers, or they cannot be recovered. But if the lower layers quickly detect the link to be down, this can be taken care of. The link can be declared down immediately and the usage of the link can be stopped. The routing protocol should be the last resort for link state detection, when all else fails. If it is used as the primary means, it could lead to many packets lost or retransmitted.
Link state restoration algorithms have been proposed in the past [6] [8] [16] [17] [18]. Various schemes are employed to restore failed links in networks running link state routing protocols. Some of the schemes concentrate on fast flooding of link state information for faster convergence [8]. The approach used by many people is to route around the failed link and to get to the other end [16] [18]. The process involves finding an alternate path to the other end of the link and tunneling the packets to the node at the other end, using the alternate path. This scheme has been proved bad as it causes transient routing loops and is quite inefficient.
A few schemes propose the use of special packets in the network, to inform only a few routers along the restoration path about the link failure [6]. These are called branch update algorithms and give a better performance compared to the ones discussed above. The overhead associated with this approach is the definition of new packet types, their structures, functions and protocols for their use. Vector metric algorithms [6] are also in use, wherein the link metrics are defined as vectors instead of the standard scalar link weights.
The procedure by which Link State Advertisements are sent out to the entire routing domain is called Reliable Flooding [1]. When a router's local state changes, it creates a Link State Update packet within which is a Link State Advertisement. The router then sends out this packet on all of its attached interfaces. This is the beginning of reliable flooding. The router's neighbor on receiving the Link State Update packet does the following:                The Link State Advertisement (LSA) within the Link State Update packet is examined for the correct type and as being the most recent one.        The LSA is installed in the Link State Database 91 in FIG. 1a.         An acknowledgement is sent to the router that sent the packet.        The LSA is re-packaged into a new Link State Update packet and sent out on all the interfaces except for the one on which the packet was received.The process continues until all the routers have the new updated LSA.        
All initiators of LSAs need to refresh their LSAs every 30 minutes. This period is called “LSRefreshTime” [4] [11] [12] and is a constant. The sequence number of the LSA is incremented and they are re-flooded throughout the routing domain. A checksum is used to detect errors in an LSA. OSPF has a restriction on the frequency at which an LSA can be generated. An interval called MinLSInterval that is set to 5 seconds is the minimum time that a router must wait before sending out an updated LSA. A rapidly changing network element, for example a link between two routers going up and down continuously can cause a lot of control traffic and deteriorate the performance of the network. The MinLSInterval is configured to guard against this problem.
A second level of protection is used for routers that refuse to respect the MinLSInterval and send out Link State Update packets very rapidly. An OSPF router will not accept an updated LSA if the current copy that it has in its Link State Database table is less than 1 second old. A neighbor on receiving such Link State Update packets will discard them and they are never sent into the network. Assuming that all the routers in the network respect the MinLSInterval, it becomes apparent that there will be routing problems in terms of dropped packets when a link goes down in less than five seconds after coming up. When the transport protocol is UDP, these packets are just lost. On the other hand TCP will handle the re-transmission of dropped packets. Also, the flooding procedure though efficient takes time which increases with the size of the network. The convergence period is the duration from the time the flooding procedure begins to the time when all the routers have the updated information and have completed the re-calculation of their routing tables. Although this period is short, during this period there will be dropped packets and transient routing loops. Routers in, the network, which are not yet updated about the information that a link has gone down, continue to use their previously calculated routes. OSPF is one of the better protocols in terms of having a short convergence period and a very efficient flooding mechanism. However with fragile links, the protocol in its current form has limitations.