The present invention relates to a method and system for implementing computing systems that perform work in a networked environment. To enable communications between two networked computing nodes, a physical pathway must exist between the two nodes. This physical pathway may exist, for example, as a networking infrastructure comprising either wired or wireless networking components. In addition, some type of logical pathway may be created over the physical pathway to carry messages between the two nodes. The logical pathway, often termed a “connection”, is a communications pathway between an entity on a first node and an entity on a second node that may be established using available inter-process communications mechanisms.
In one approach to implementing a networked computing environments, an entity on the first node that seeks to communicate with a second node will initiate and maintain a dedicated connection between the two nodes. In this approach, the connection is a dedicated resource that can only be used by the entity that is associated with it. Other resources opened at the remote node because of this connection may also be configured as dedicated resource. One advantage with this approach is that since the dedicated resources are closely tied to the specific entity that initiates these resources, the shutdown or failure of that entity allows easy/automatic identification and cleanup of those resources. This reliably frees up those resources to be used by other entities on the system after the shutdown or failure of the entity that initiates the dedicated resources.
A disadvantage with this approach is that since the resources are dedicated to a single entity, other entities that wish to communicate between the same two nodes must initiate their own dedicated connections. This could be inefficient since the connections/resources are not always in active use by their associated entities, but being dedicated resources, cannot be shared with other entities even if idle. Moreover, on most systems, there is a limit upon the number of available connections that may be simultaneously opened. If every entity must open its own dedicated connections/resources, then the limited supply of available connections/resources may be a bottleneck that restricts the amount of work performed at the computing system.
To address this problem, a computing system could allow non-dedicated connections that are de-coupled from the entities that make use of these connections. In this approach, multiple entities may share the same set of connections between a first node and a second node. By de-coupling the calling entity from the connection, any connection initiated by another entity, if idle, may be used by other entities having the requisite authority to share the non-dedicated connection.
However, because resources are now de-coupled from their calling entity, it is possible that the failure or shutdown of the calling entity may not result in the automatic release of some or all of those resources. These “zombie” resources may continue to exist long after the shutdown or failure of the specific entity that initiates the resources. Each such resource consumes a quantity of system resources that is therefore not available to be used by others for useful purposes. Over time, the number of extraneous resources that exist in the computing system could cause a significant decrease in system efficiency.
Accordingly, a system and method is disclosed that allows resources to be shared among multiple entities, but which can appropriately release system resources after the failure or shutdown of the calling entity. In one embodiment, a monitoring entity is available to check for session failures. If a session failure is detected, the session is identified in a shared list that is accessible to other related entities. The related entities can be configured to piggyback a message to an appropriate node to kill, shutdown, or release resources associated with the failed session. Alternatively, a related entity can be specifically initiated to send a message to the appropriate node to kill, shutdown, or release the resources.
Further details of aspects, objects, and advantages of the invention are described in the detailed description, drawings, and claims.