The present disclosure relates generally to information handling systems, and more particularly to fault detection and recovery of broker and server processes provided on information handling system(s) that are included in an inter-process communication system.
As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
In an Operating System (OS) provided on an information handling system, or between information handling systems on a network, application processes may communicate using Inter-Process Communication (IPC) such as, for example, Remote Procedure Calls (RPC). Typically, applications or application processes categorized as clients and servers may use IPC where the client requests data or functions and the server responds to the client requests. An IPC system may also include a broker that provides a routing process that routes calls from the clients to the servers. Typically, servers in the IPC system will “register” themselves with the broker, and the broker then establishes a communication channel with the registered servers. Such IPC system operations introduce a plurality of directional channels that have the potential to have faults. For examples, faults may occur with a request from the client to the broker, the request from the broker to the server, a response from the server to the broker, and the response from the broker to the client. Furthermore, the servers may utilize a sideband channel to communicate and register with the broker and/or perform other administrative functions, which provides additional channels that may have faults: a sideband channel between the server and the broker, and a sideband channel between the broker and the server. When any of these channels has a fault, the entire IPC system may be affected and result in the blocking, delaying, or dropping of messages from many clients to many servers. For instance, if a server connection to the broker has an issue, then many clients may time out while trying to request services from that server. Further still, there are several layers to detect if there is an issue with a channel between the broker and the server, and several fault conditions that can impact the ability of the IPC system to provide service.
Accordingly, it would be desirable to provide improved IPC fault detection and recovery system.