In distributed computing environments where procedure calls are being made between remote sites, software compatibility problems can arise when the software at one or more of the sites is changed without a corresponding change in the software at the other sites. This typically occurs when procedures are changed and recompiled at one site but applications at other sites which depend upon those procedures are not recompiled. When this occurs, the application which depends upon the now updated procedure will often not work correctly or, under some circumstances, not at all. Throughout this application, the term "procedure" is used generically to refer to any form of software, such as units, modules, functions, subroutines or any other type of software, and is not meant to be limiting.
For example, consider FIG. 1 which illustrates a simple distributed environment with two remote sites, SITE1 and SITE2. CALLER1100 is a procedure at SITE1 which calls TARGET1102, a procedure at SITE2. Since this is a distributed environment, CALLER1100 or TARGET1102 may change at any time without a corresponding change to the other procedure, creating software incompatibilities. For example, TARGET1102 may have been written to receive two integer values and CALLER1100 written to pass two integer values. If TARGET1102 is later changed to receive two character strings and then recompiled, the next time CALLER1100 calls TARGET1102, TARGET1102 will most likely execute incorrectly or not at all.
Although the present problem has been presented in the context of software tasks running on different or remote physical sites, software incompatibilities also occur when remote procedure calls are made between different tasks running at the same site and even on the same processor.
Historically, several approaches have been used to manage dependencies in distributed computing environments to ensure compatibility during remote procedure calls. Four of these include (1) synchronized installation; (2) time stamps; (3) self-describing data; and (4) data type encoding.
The synchronized installation approach involves simultaneously installing software at all sites to ensure that all software dependencies between sites match. Typically, this is achieved through the implementation of strict software configuration management, and among the four approaches is most likely to ensure software compatibility between the sites. Moreover, synchronized installation does not adversely affect software performance since software overhead is unaffected. However, as the number of sites increases, it becomes significantly more difficult to maintain up-to-date object level compatibility between software at multiple sites. Consequently, this approach is only suited for a distributed environment with a limited number of sites.
The use of time stamps involves recording an actual creation time or "actual time stamp" of each top-level compilation unit and then checking that actual time stamp against an expected time stamp maintained by the calling procedure. Each time a compilation unit is compiled, the new actual creation time is stored as an actual time stamp. Similarly, each time a calling procedure is recompiled, the stored creation times for each compilation unit called are stored by the calling procedure as expected time stamps. Upon execution of a remote procedure call, the expected time stamp is then passed, usually in the parameter list, from the calling procedure to the target procedure and compared by the target procedure to the actual time stamp previously stored by the target procedure.
For example, referring again to FIG. 1, CALLER1100 maintains an expected time stamp and passes it to TARGET1102 each time it calls TARGET1102. The expected time stamp is compared to the actual time stamp previously stored by TARGET1102. If the expected time stamp passed by CALLER1100 does not match the actual time stamp maintained by TARGET1102, the call fails and CALLER1100 is marked as needing to be recompiled. The next time an execution of CALLER1100 is attempted, CALLER1100 will automatically be recompiled if a compiler is available at SITE1. If a compiler is not available, then the execution of CALLER1100 will be prohibited.
The time stamp approach has a high probability of detecting software incompatibilities because of the very low probability that two incompatible versions of a particular compilation unit would be compiled at the exact same time. Moreover, the time stamp approach has little adverse effects on performance since the time stamp only adds a few bytes (typically 8) to each remote procedure call.
However, the use of time stamps has several disadvantages. First, the time stamp approach requires that a compiler be available at each site. Second, the time stamp approach is very strict and inflexible and does not readily provide for evolutionary software development. This is because any change to a compilation unit upon which a calling procedure depends, will require the recompilation of all calling procedures which depend on that particular target procedure. This is true even if the change occurred in target procedures in the compilation unit not called by the calling procedure. These problems are exacerbated as the number of sites increase, making the use of this approach frustrating and cumbersome. Consequently, as with the synchronized installation approach, the time stamp approach is only practical for distributed environments having a limited number of sites.
Perhaps the most widely used approach for ensuring the safety of remote procedure calls in distributing computing environments is the use of self-describing data. With this approach, additional data is included in each remote procedure call which fully describes each parameter. This data typically includes type, mode, constraints, and any other meta-data required to fully describe the parameter and ensure correctness.
During execution, the self describing data is compared to the target procedure data types and if either the calling procedure or the target procedure has changed, the call will only be completed if the change is compatible, or if a valid conversion can be applied by the target procedure. This inherent flexibility is particularly helpful if the change is limited in scope and relatively benign. For example, a parameter in a target procedure may have changed from an integer value to a double precision real value. Although the calling procedure continues to pass an integer, the target procedure can easily convert the integer to a double precision real and then continue executing. In addition to providing the flexibility of data conversion, the self-describing data approach does not require a compiler at each site and is well suited for distributed environments with a large number of sites.
However, the self-describing data approach adversely affects performance in two ways. First, the self-describing data greatly increases the amount of data being passed in each remote procedure call. Secondly, the data type information is typically interleaved with the parameters, requiring that all of the parameters be checked before compatibility can be confirmed. Consequently, a difference in the last parameter will not be detected until all of the other parameters have been checked.
The last approach involves encoding the data type information of the formal parameters into a number which is then included in each remote procedure call as an additional parameter. The effectiveness of this approach depends upon the encoding scheme selected and how many bytes are used for the encoding. The encoding approach does not add much remote procedure call overhead and provides for data type conversions between compatible data types if the encoding scheme is sophisticated enough.
One common encoding method, "hashing", has the disadvantage that it lacks fine-grained control, yielding only a "compatible" or "not compatible" answer when comparing the expected hash number with the actual hash number. For example, assume a hash function "Hash", applied to a procedure "X", with two parameters of types "A" and "B", yields a single number "N". Change the prototype of "X" to have types "C" and "D" as its parameters. A good hash would assure with a high probability that given N=Hash (X(A,B)) and M=Hash (X(C,D)), that N does not equal M.
Using this functionality, one could quickly and easily determine whether a remote call is compatible with the current state of the target. However, it would be very difficult, if not impossible, to apply type-conversions, then redetermine compatibility.
Finally, hashes and other encoding schemes tend to be less extensible, since it is much easier to create an efficient, correct encoding when the entire data-set to be encoded is known. Adding a new element to the data-set can invalidate the entire hash.
In view of the problems created by software incompatibilities and the inherent limitations in each of the solutions described above, a method of managing dependencies in a distributed computing environment to ensure the safety of remote procedure calls which is easy to maintain, allows compatible changes with a minimal impact on overhead and which works well with many sites is highly desirable.