1. Field of the Invention
This invention relates to the field of multi-threaded, object-oriented computer environments.
2. Background
In multi-threaded, object-oriented computer environments of the prior art, problems arise when, as an object server is shutting down, a new object server is being started as a response to an invocation by a client. The daemon process responsible for starting a new server assumes that the old server shuts down instantaneously. In actuality certain cleanup processes, such as the release of database locks, may still be in progress when a new object server is started. These cleanup processes can cause the new object server to abort startup. Thus, an undesired race condition exists between the shutdown of old object servers and the startup of new object servers.
Networked Object Environment
In the networked object environment, an object server, also referred to as a server process, is a multi-threaded process that controls access to, instantiation of, and deletion of, the methods, data, etc., embodied in the objects within its domain. Objects are self-contained, dearly defined, software modules that encapsulate procedures, such as business practices, and their data. Communicating with each other through carefully defined interfaces, objects allow complex solutions to be constructed similar to the manner in which computers are manufactured from standardized electronic components. Multiple levels of re-use and standardization are made possible, enabling engineers to produce modules, applications, and entire systems that are highly reusable and leveragable.
Networked object technology allows applications access to objects and their shared services anywhere in the network, substantially independent of where the application or object resides. Networked objects also permit individual objects to be updated without the risk of disrupting the application or business process it models, facilitating the graceful, incremental evolution of complex systems. For example, if a new component fails, the component's predecessor can be re-instated quickly and transparently.
An example of the networked object environment is the CORBA-compliant NEO product family from SunSoft.TM., which provides for sharing of objects across networks and differing computing platforms.
Consisting of over 600 software vendors, developers, and end user organizations, the Object Management Group (OMG) has developed and continues to develop standards for a common architecture supporting heterogeneous, distributed, object-oriented applications. The OMG Common Object Request Broker Architecture (CORBA) is designed to allow different object systems from multiple vendors to interact with each other on a network. The CORBA specification comprises the following four components:
i) An Object Request Broker (ORB) to manage objects in a networked environment; PA1 ii) Interoperability for ORB-to-ORB communications; PA1 iii) Common Object Services (CORBAservices); and PA1 iv) Mappings for commonly used programming languages.
The Network Object Request Broker (ORB) is a CORBA-compliant network-based software system providing for the location and execution of objects through a standard interface protocol, enabling objects and programs to interact with each other across the network. The NEO ORB is implemented as one or more multi-threaded UNIX processes, providing scalable performance and availability as needed.
To promote heterogeneous object interoperability, the OMG has provided a portable source code reference implementation of the CORBA 2.0 Internet Inter-ORB Protocol to assist software vendors in testing and delivering OMG-compliant products. The Internet Inter-ORB Protocol (Internet IOP) provides a standardized way of connecting ORBs from different CORBA 2.0 compliant vendors, enabling them to communicate with each other. The current Internet IOP is based on TCP/IP protocols.
The OMG CORBAservices definition describes the basic operations required for building distributed systems with objects, such as naming, events, properties, lifecycle and relationship services.
For different object systems to interact, language independence is a concern. The Interface Definition Language (IDL) enables the separation of interface and implementation, allowing object implementation details to change without compromising the plug-and-play qualities of the object. The OMG IDL is a neutral interface definition of an object's operations, allowing the behavior of the object to be defined in IDL, but accommodating the automated transformation of the interface to the C, C++, Objective C, or Smalltalk languages.
A multi-threaded environment, such as that provided by UNIX, is typically used for supporting networked objects. Threads are subprocesses spawned off of larger processes for performing a certain function, e.g. performing a printing process, acting on a database object, etc. By supporting multiple threads, the system can serve many clients and processes simultaneously. This enables the sharing of objects and services on the network.
In the CORBA environment, an Object Request Broker Daemon process (ORBD) receives object requests, also referred to as method invocations, from the client processes registered to it. The ORB daemon then locates the object on the network, and acts as the interface between the client process and the networked object. In the NEO environment, the ORB daemon may activate a NEO object server to act as a further interface for the object which may be a standard NEO object or, in some instances, a legacy process encapsulated in a NEO shell to perform as a NEO object. The NEO object server acts to instantiate the object as is necessary to respond to the requests forwarded by the ORB daemon.
System Block Diagram
FIG. 1 is a block diagram of a CORBA-compliant networked object system. Multiple threads are represented by elements 100-103, where threads 100-101 are threads spawned from a first client process, Client Process 1, and threads 102-103 are threads spawned from a second client process, Client Process N. As indicated in FIG. 1, a single client process can spawn any number of threads. Each of threads 100-103 is linked to Object Request Broker Daemon (ORBD) process 104. ORBD process 104 is in turn linked to a plurality of object servers represented by object server 105 and object server 107. A second ORBD process, ORBD process 110, is further linked to ORBD process 104. ORBD process 110 could also be coupled to further object servers and/or client processes (not shown). Object server 105 is linked to object 106. Object server 107 is linked to objects 108 and 109.
ORBD process 104 receives object requests, such as method invocations in the form of locate requests, from client process threads 100-103, and determines which object server is supporting the appropriate object. If the necessary server is not currently running, the server is activated and the object is instantiated. Information on the location of the object is returned in response to the locate request, and further requests between the thread and the object are directed by the location information. The same object can be similarly invoked by locate requests from other threads to establish interaction between the object and all applicable threads concurrently.
ORB daemon 110 may provide a gateway for the networked object environment over a large network such as the Internet and/or it may provide cross-platform interaction by providing a platform dependent interface to clients and object servers in its own domain, while providing a standardized interface to ORBD 104.
Object servers 105 and 107 provide access to objects or object libraries, such as shown by objects 106 and 108-109. Legacy objects, that is those objects comprising stand-alone applications and other objects not originally designed for the networked object environment, are provided with an IDL shell that forms an interface through which the object server can access the functions of the legacy object. A Persistent Store Manager process running in tandem with the ORB daemon keeps track of locks the object server may have on objects, e.g., database objects, to maintain server-exclusive access.
As the network is substantially independent of hardware boundaries, the objects and object servers may reside on the same computer as the client processes and the ORB daemon, or they may reside on separate computers within the network. Similarly, the networked object environment is substantially independent of the base level implementation of the network.
Shutdown Protocol
A prior art implementation of the shutdown protocol for object servers is as follows. An object server decides to shut down, for instance, due to idle time or possibly in response to a client's invocation. The object server then begins to shut down all active objects, waiting for all method invocations to finish. When all of the objects associated with the object server are shut down, the object server sets its server state to "in shutdown," and signals to the ORB daemon that it is shutting down. When the ORB daemon is successfully notified that the object server is shutting down, the server sets its server state to "finished," and terminates the connection to the ORB daemon. Finally, the object server signals the main thread that shutdown is complete, and the main thread proceeds to perform the last cleanup, such as releasing any locks the object server might have into the Persistent Store Manager.
The object server finite state machine running in the object server is illustrated in the state diagram of FIG. 2. The server state machine consists of four states: "not running," "running," "in shutdown," and "finished." When the server starts, the server is in state 200, "not running," and any invocations of methods are made to wait, as indicated by arrow 204. Once the object server has registered with the ORB daemon and a run indication is received by the object server, as shown by arrow 205, the object methods are enabled and the server state advances to state 201, "running."
While in state 201, new invocations increment the active methods counter, as shown by arrow 206, and ending method invocations decrement the active methods counter, as shown by arrow 207. When the object server is to be shut down due to excessive idle time, an invocation from a client, etc., the object server waits till all method invocations clear, as shown by arrow 208, then signals the ORB daemon that it is shutting down, forces new invocations from clients to wait, and sets its server state to state 202, "in shutdown."
As shown by arrow 209, further invocations during state 202 are forced to wait. After the ORB daemon has been successfully notified that the server is shutting down, then, as shown by arrow 210, the object server returns an error to all waiting clients and forces clients to rebind, i.e., to locate a new object server. The server state is then advanced to state 203, "finished," wherein the last cleanup operations, such as removal of locks, are performed.
A second server finite state machine operates inside the ORB daemon, and determines the activation/deactivation control exhibited upon the server by the ORB daemon. This second finite state machine has three states: "start," "starting," and "running." A state diagram of this three-state finite state machine is shown in FIG. 3. When a locate request targeting a server in "start" state 300 arrives at the ORB daemon, as indicated by arrow 303, the server is forked off as a new process, the requesting method invocation is blocked, and the server state enters "starting" state 301.
While in state 301, all locate requests are blocked and forced to wait for registration of the server, as shown by arrow 305. If the server PID (process ID) dies, as represented by arrow 304, then any waiting method invocations are unblocked and the server returns to "start" state 300, where a waiting method invocation will retry to start the server. If, while in "starting" state 301, the server registers with the ORB daemon, as shown by arrow 306, all waiting method invocations are unblocked and the server state enters "running" state 302.
As indicated by arrow 307, all subsequent locate requests received while in "running" state 302 return the address information that the server provided as part of its registration. As shown by arrow 308, when the server signals, as part of its shutdown protocol, that it is shutting down, the ORB daemon cleans up and the server state returns to "start" state 300.
The primary problem with the server activation/deactivation protocol of FIGS. 2 and 3 is that race conditions occur while shutting a server down. Shutdown procedures, such as the removal of locks, occur in the server after the server has signaled to the ORB daemon that it has shut down. However, the ORB daemon operates as if the server has completely shut down at the time the shutdown signal is received from the server. This implies to the ORB daemon that a new server can start immediately as a result of a locate request.
The conflict arises when a new server tries to access resources that are still locked to the old server. If the old server has not yet removed the locks, the new server is denied access to the locked resources, and the new server aborts startup. A race condition thus exists between the release of all locks on resources held by the old server and the accessing of those same resources by the new server. If the locks are released first, then, barring any other problems, the new server will complete startup successfully. If the new server tries to access the resources first, then the new server will be aborted. Forking off a new server process, only to have the new server process abort in the midst of startup, is a waste of CPU processing time that is better spent on other processes, such as the shutdown of the old server.
The problem lies in the server finite state machine within the ORB daemon (i.e., FIG. 3). This state machine does not account for the shutdown process (i.e., during the time when a server is moving from "running" state 307 to "start" state 300). It is legal to immediately start a new server even though the "shutting down" server may not have fully shut down. This causes the race condition between the old server shutting down and the new server starting up.
The existing protocol also does not handle servers that start without registering or take too long to register, and servers that shut down too slowly. If a server is too slow to register with the ORB daemon, e.g., because the server is hanging, then action should be taken. Similarly, if a server is too slow to shut down, e.g., because the server is hanging, then action should be taken to allow startup of a new server. Currently, no mechanism exists for handling these problems.
Further, there is currently no mechanism for handling a thrashing condition. A thrashing condition occurs when a server undergoes a series of aborted startups and restarts. This can happen when a server attempts to restart too rapidly. For instance, daemonic servers, which are restarted automatically by the ORB daemon whenever they exit, can be seriously impaired by thrashing behavior. Thrashing may also indicate an unrecoverable error in the startup process of the server. If there is no mechanism for handling a thrashing condition, this problem cannot be prevented from occurring repeatedly in the future.
Finally, the ORB daemon is not currently equipped to handle "self started servers" (also called "user servers") in the state machine. Self started servers are servers that just register and deregister themselves with the ORB daemon, but are not spawned by the ORB daemon as a result of an invocation.