This invention relates generally to communication protocols which are particularly suitable for self-reconfigurable multi-purpose communication networks, such as ad-hoc networks. More particularly, the protocol utilizes learning-based strategies to achieve routing objectives.
Various routing mechanisms have been proposed for ad-hoc wireless networks. In general, an ad-hoc wireless sensor network has the following properties: (1) the structure of the network is unknown and may change dynamically, (2) each node has limited computation resources and lifetime, and (3) each node can obtain pieces of information from local sensors and communicate with others within a limited range. The power of such sensor networks is derived from communication, since each node is only able to sense local information with little computational resources. The routing mechanisms proposed for such networks fall into two basic categories, table-driven or source-initiated. Table-driven protocols rely on an underlying global routing table update mechanism for all nodes in the network, a mechanism that would not be energy efficient for ad-hoc dynamic networks. Source-initiated protocols, on the other hand, discover a route every time it is needed.
Existing routing protocols differ mainly in routing metrics, but all use a fixed routing objective. In most cases, routing objectives are implicitly embedded in strategies. Examples of these routing metrics include use of the shortest path, degree of association stability, signal stability or strength combined with shortest path, and information gain. Protocols also differ by destination specifications. The majority of early protocols are address-based or geographical location-based.
All existing routing protocols for wireless networks are implicitly associated with their routing strategies, which generally fall into two classes, flooding-based or search-based. Flooding-based methods begin with a route discovery phase (flooding the network), followed by a route maintenance phase for repairing disconnected routes. Flooding-based strategies are more suitable for relatively stable networks, since maintaining and repairing routes can be costly for dynamic networks. Search-based methods normally discover routes by selecting the next “best” hop at every node on the route. Routes may differ from message to message, even to the same destination node, and there is no route maintenance.
However, existing ad-hoc wireless protocols do not have theoretical results on delivery or route optimality. Distributed quality-of-service routing for ad-hoc networks has been proposed, in which a set of probes is used to find an optimal route before actual messages are sent, followed by a route maintenance phase to repair the broken route. But this approach is not suitable for dynamic networks in which there is no fixed optimal route over time.
Existing routing mechanisms for ad-hoc wireless networks have two limitations: routing objectives are fixed and embedded in strategies and quality-of-service routing does not work well for dynamic networks. It would be useful to have a framework of distributed routing strategies based on real-time reinforcement learning, so that messages discover and learn their routes on the way to their destinations. The separation of routing objectives and routing strategies would also make it possible for network systems to change routing objectives from time to time, given different task characterizations and requirements.