The present invention relates generally to distributed objects environments and, in particular, to a transport abstraction layer providing the capability of using multiple communications protocols in an application transparent fashion.
Distributed and parallel systems form a very important segment of modern computing environments. Experience with such systems has exposed several requirements of system and component design which have historically been recognized only after a system has been deployed. A critical requirement (especially for systems with any longevity) is the need for the system and system components to be able to evolve over time.
By definition, a distributed system is one which contains components which need to communicate with one another. In most practical systems, however, many of these components will not be created xe2x80x9cfrom scratch xe2x80x9d. Components tend to have long lifetimes, be shared across systems, and be written by different developers, at different times, in different programming languages, with different tools. In addition, systems are not staticxe2x80x94any large scale system will have components that must be updated, and new components and capabilities will be added to the system at different stages in its lifetime. The choice of platform, the level of available technology, and the current fashion in the programming community all conspire to create what is typically an integration and evolution nightmare.
The most common solution to this problem is to attempt to avoid it by declaring that all components in the system will be designed to a single distributed programming model and will use its underlying communication protocol. This approach tends not to work well for several reasons. First, by the time the decision has been made to use one model or protocol (which may be quite early in the life cycle of a system) there may already be existing components which there is a desire to use, but which do not support the selected model or protocol. Second, the choice of model and protocol may severely restrict other choices (e.g., the language in which a component is to be written or the platform on which it is to be implemented) due to the availability of support for the model.
Finally, such choices tend to be made in the belief that the ultimate model and protocol have finally been found, or at least that the current choice is sufficiently flexible to incorporate any future changes. That belief has, historically, been discovered to be unfoundedxe2x80x94a situation which is not likely to change. Invariably, a small number of years down the road (and often well within the life of an existing system), a new xe2x80x9clatest and greatestxe2x80x9d model is invented, and the owner of the system is faced with the choice of adhering to the old model (which may leave the system unable to communicate with other systems and restrict the capabilities of new components) or upgrade the entire system to the new model. This approach is always an expensive option, and may in fact be intractable (for instance, it is not unheard of for systems to contain an investment of hundreds of man-years in xe2x80x9clegacyxe2x80x9d source code) or even impossible (as, for example, when the source code for a component is simply not available).
An alternative solution accepts the fact that a component or set of components may not speak the xe2x80x9ccommonxe2x80x9d protocol, and provide xe2x80x9cproxy servicesxe2x80x9d (or xe2x80x9cprotocol wrappersxe2x80x9d or xe2x80x9cgatewaysxe2x80x9d) between the communication protocols. Under this scheme, the communication is first sent to the gateway, which translates it into the non-standard protocol and forwards it on to the component. This technique typically gives rise to performance issues (due to message forwarding), resource issues (due to multiple in-memory message representations), reliability issues (due to the introduction of new messages and failure conditions), as well as security, location, configuration, and consistency problems (due to the disjoint mechanisms used by different communication protocols).
It is tempting to think that this problem is merely a temporary condition caused by the recent explosion in the number of protocols (and that things will stabilize soon) or that the problem is just an artifact of poor design in legacy components (and won""t be so bad next time). However the problem of protocol evolution is intrinsic in building practical distributed systems. There will always be xe2x80x9cbetterxe2x80x9d protocols, domain specific motivations to use them, and xe2x80x9clegacyxe2x80x9d components and protocols that must be supported. Indeed, nearly any real distributed system will have at least three models: those of xe2x80x9clegacyxe2x80x9d components, the current standard, and the emerging xe2x80x9clatest and greatestxe2x80x9d. The contents of these categories shift with timexe2x80x94today""s applications and standard protocols will be tomorrow""s legacy. Systems and components evolve along multiple dimensions:
Evolution of Component Interface
A component""s interface may evolve to support new features. The danger is that this evolution will require all clients of the component to be updated. For reasons cited in the previous section, there must be a mechanism whereby old clients can continue to use the old interface, yet new clients can take advantage of the new features.
Evolution of Component Implementation
A component""s implementation may evolve independently of the rest of the system. This may include the relocation of a component to a new hardware platform or the reimplementation of a component in a new programming language. There must be a mechanism which insulates other components from these changes in the implementation yet maintains the semantic guarantees promised by the interface.
Evolution of Inter-Communication Protocol
It is generally intractable to chose a single communication protocol for all components in the system as new protocols are attractive due to their performance, availability, security, and suitability to the application""s needs. Each communication protocol has its own model of component location, component binding, and often a model of data/parameter representation. It must be possible to change or add communication protocols without rendering existing components inaccessible.
Evolution of Inter-Component Communication model/API
The programming models used to perform inter-component communication continue to evolve. Existing models change over time to support new data types which can be communicated and new communication semantics. At the same time, new programming models are frequently developed which are attractive due to their applicability to a particular application, their familiarity to programmers on a particular platform, or merely current fashion or corporate favor. It must be possible to implement components to a new model or a new version of an existing model without limiting the choice of protocols to be used underneath and without sacrificing interoperability with existing components written to other models or other versions of the same model (even when those components will reside in the same address space).
Distributed Object Systems such as CORBA and OLE, like the Remote Procedure Call models which preceded them, address the issue of protocol evolution to a degree by separating the programming model from the details of the underlying protocol which is used to implement the communication. These systems do so by introducing a declarative Interface Definition Language (IDL) and a compiler which generates code that transforms (or allow the transformation of) a protocol neutral API to the particular protocol supported by the model. As the protocol changes (or new protocols become available), the compiler can be updated to generate new protocol adapters to track the protocol evolution.
A side benefit of IDL is that it forces each component""s interface to be documented and decouples a component""s interface from its implementation. This allows an implementation to be updated without affecting the programming API of clients and simplifies the parallel development of multiple components.
In CORBA and OLE, interfaces are reflectivexe2x80x94a client can ask an implementation object whether it supports a particular interface. Using this dynamic mechanism, a client can be insulated from interface (as well as implementation) changes as clients familiar with a new interface (or a new version of an interface) ask about it, while old clients restrict themselves to using the old interface.
While such systems abstract the choice of communication protocol, none addresses the situation in which a system needs to be composed of components that cannot all share a single protocol or a single version of a protocol. CORBA and OLE have each defined a protocol that all components xe2x80x9cwill eventually adoptxe2x80x9d. For reasons cited above, that solution is merely the addition of yet another (incompatible) protocol to the mixxe2x80x94a protocol which will evolve, and in fact is already evolving.
It would be desirable to have a communications framework that provides for the evolution of communications models and protocols and provides a mechanism for accessing legacy applications and for overcoming related problems.
The communications infrastructure of the present invention provides a mechanism that supports multiple simultaneous communication protocols. The novel mechanism of the invention allows an application program executing on one process to make method calls on objects located in other processes and yet be entirely oblivious to the communication protocol used to deliver data between the two processes. Furthermore, the mechanism allows a transport (or protocol) to be independent of the in-memory representation chosen for abstract data types transferred between distributed processes.
The present invention provides a communications framework that presents application code with an abstraction layer including a distributed apply function. The abstraction of the apply allows applications programs to be written without any direct reference to the communication protocol selected to implement the distributed apply. The abstraction layer further includes mechanisms for causing self-marshaling and demarshaling of arguments provided to remote procedures. The marshaling of arguments is accomplished in a manner that does not require knowledge of the memory layout chosen by the application. The marshaling of arguments permits in-memory representations of abstract data types to be independent of the underlying communication protocol.
In the communications framework, an application program invokes a method on a target object. This method invocation is converted into an invokation of the distributed apply method. This invocation of apply is passed as arguments an ObjectReference referring to the target object, an identifier for the method to invoke, and a self-marshalling argument list. This apply invokation is on a Remote Procedure Call Transport which operates to establish a communications link to a process in which the target object resides.
The marshaling and demarshaling of arguments passed to remote methods is accomplished according to the invention by defining an OutStream class. The OutStream class defines an interface for at least one primitive marshaler and for a composite data type marshaler, wherein each remote procedure call transport derives an OutStream object from the OutStream class for marshaling arguments onto the communications link. The communication framework also includes a composite data type class and at least one transport independent marshaler. The OutStream object recognizes any argument that is of a composite data type. The RPC_Transport invokes a transport independent marshaler to marshal any composite data type argument objects.
The transport independent marshalers invoke the primitive marshalers to marshal any non-composite components of a composite argument object. To marshal composite components, the transport independent marshalers invoke marshaling methods of such composite components. The marshaling of composite data types is accomplished in a recursive fashion.
The communication framework also contains an InStream class for defining an interface for primitive demarshalers and for composite data type demarshalers. The RPC_Transport derives an instream object from the InStream class for demarshaling arguments received on a communications link.
The communications framework of the invention defines ObjectReferences for target objects on which remote methods are invoked. An ObjectReference is an object that defines information necessary or useful in locating the target object. An ObjectReference for the target object is one of the elements of an invocation of the distributed apply method.
ObjectReferences may be grouped with other ObjectReferences that are generally co-located or co-migrated. Such a grouping is referred to as a VirtualProcess. Each ObjectReference or VirtualProcess contains (or refers to other objects that contain) one or more protocol-specific profiles. These profiles are hints of how a process may connect to a target object using a given protocol. The profiles are grouped into those that have been successfully used (verified) and those that have not been used (unverified). The communications framework provides a mechanism for merging the profiles received with an ObjectReference. This merging mechanism gives priority to verified profiles.
The communications framework of the invention further defines a query_op method. The interface definition for query_op specifies that the query_op method accepts a reference to a specifier for an operation to be performed, and that the query_op method returns a reference to a dispatch function to be invoked to perform the operation provided as an argument to an invocation of the query_op method. The dispatch function accepts as arguments a reference to a target object on which to perform the operation and a reference to an object containing a list of values passed to and returned from the operation, and at least one target object deriving from the base class and providing an implementation for the query_op method.
The query_op method may provide a reference to an argument list. The caller (of query_op) then provides values for that argument list by demarshaling arguments received over the communications link between the calling process and the process of the target object.
Other aspects and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.