By way of background, many modern systems are implemented by integrating several network elements, such as a frontend web server that interacts with a backend database server. When these systems provide critical services, they are often replicated on multiple sites to maximize service availability, especially following failures of networking equipment or facilities, or other externally attributable events that render site hosting equipment unavailable or inaccessible. While failures (e.g., profound unavailability/non-responsiveness) of the frontend machines facing client devices (e.g., web browsers) may be automatically detected by the client and trigger the client to automatically recover service to an alternate site, failures of backend servers typically will not trigger client initiated recovery. For example, if the database server supporting an e-commerce site is unavailable, then the typical implementation would simply return a webpage to the client saying the site was temporarily unavailable and to try again later. Thus, standard practice today is for complex, multi-element solutions to return descriptive errors to clients (for failure of backend elements that do not directly communicate with clients).
If a backend server (such as a database) fails, a traditional strategy is to leverage geographically distributed redundant systems. In this regard, the frontend server (e.g. a web server) recovers service onto the redundant database server on a geographically remote site. However, this causes messages to be sent between two geographically remote sites. If these sites are far apart, and there are many messages needed between the web server and the database, this can significantly increase the response time of the web server and use significant bandwidth between sites. Thus, this solution might increase delay and network traffic if the element is located in a remote site.