Fault resilience and synchronization in networks are important issues for better performance from a network. Fault resilience refers to the ability of a network to continue to operate when portions of the network (for example, servers) may not be operating properly and/or recover when the previously non-operating portions are again operating within the network. For example, if a network is not fault resilient then a single fault may make the entire network unavailable to many different network entities (clients, servers, etc.). Synchronization refers to portions of the network (for example, servers, caches, DNSs) having the most current information related to a function. For example, if a network is not in synchronization with respect to DNS entries, then parties may be sent to non-existent resources (such as servers). If a network is not in synchronization with respect to content, then a server may send a client old information. If a network is not in synchronization with respect to the best route for obtaining information, then the delivery of such information may be delayed (such as a slower response time).
A conventional approach to fault resilience may store routes of communication in a persistent store, such as a disk drive, which requires disk space, disk I/O, and disk data management. Additionally, conventional approaches may not use self-healing techniques, and thus, if a resource is returned to service after a disruption it may lose all existing route information.
Conventional approaches for synchronization may use explicit communications between global resource manager (GRM) servers for synchronization. This may lead to communications between servers on the order of n*(n−1) where n is the number of GRM servers in a system.
As the speed of communication increases, and content that is sensitive to disruption and/or delay and/or latency issues (such as streaming video) is communicated, networks without fault resilience and synchronization may present problems.