1. Statement of the Technical Field
The present invention relates to the field of distributed computing and Web services, and more particularly to the field of checkpointing in a distributed system.
2. Description of the Related Art
Web services represent the leading edge of distributed computing and are viewed as the foundation for developing a truly universal model for supporting the rapid development of component-based applications over the World Wide Web. Web services are known in the art to include a stack of emerging standards that describe a service-oriented, component-based application architecture. Specifically, Web services are loosely coupled, reusable software components that semantically encapsulate discrete functionality and are distributed and programmatically accessible over standard Internet protocols.
Conceptually, Web services represent a model in which discrete tasks within processes are distributed widely throughout a value net. Notably, many industry experts consider the service-oriented Web services initiative to be the next evolutionary phase of the Internet. Typically, Web services can be defined by an interface such as the Web services definition language (WSDL), and can be implemented according to the interface, though the implementation details matter little so long as the implementation conforms to the Web services interface. Once a Web service has been implemented according to a corresponding interface, the implementation can be registered with a Web services registry, such as Universal Description, Discovery and Integration (UDDI), as is well known in the art. Upon registration, the Web service can be accessed by a service requestor through the use of any supporting messaging protocol, including for example, the simple object access protocol (SOAP).
In a service-oriented application environment supporting Web services, locating reliable services and integrating those reliable services dynamically in realtime to meet the objectives of an application has proven problematic. While registries, directories and discovery protocols provide a base structure for implementing service detection and service-to-service interconnection logic, registries, directories, and discovery protocols alone are not suitable for distributed interoperability. Rather, a more structured, formalized mechanism can be necessary to facilitate the distribution of Web services in the formation of a unified application.
Notably, the physiology of a grid mechanism through the Open Grid Services Architecture (OGSA) can provide protocols both in discovery and also in binding of Web services, hereinafter referred to as “grid services”, across distributed systems in a manner which would otherwise not be possible through the exclusive use of registries, directories and discovery protocols. As described both in Ian Foster, Carl Kesselman, and Steven Tuecke, The Anatomy of the Grid, Intl J. Supercomputer Applications (2001), and also in Ian Foster, Carl Kesselman, Jeffrey M. Nick and Steven Tuecke, The Physiology of the Grid, Globus.org (Jun. 22, 2002), a grid mechanism can provide distributed computing infrastructure through which grid services instances can be created, named and discovered by requesting clients.
Grid services extend mere Web services by providing enhanced resource sharing and scheduling support, support for long-lived state commonly required by sophisticated distributed applications, as well as support for inter-enterprise collaborations. Moreover, while Web services alone address discovery and invocation of persistent services, grid services support transient service instances which can be created and destroyed dynamically. Notable benefits of using grid services can include a reduced cost of ownership of information technology due to the more efficient utilization of computing resources, and an improvement in the ease of integrating various computing components. Thus, the grid mechanism, and in particular, a grid mechanism which conforms to the OGSA, can implement a service-oriented architecture through which a basis for distributed system integration can be provided—even across organizational domains.
Importantly, Grid services in many cases need not be short lived, transient Web services, but long running Web services. For instance, a long running Web service can result from computing for long intervals, or from ensuring access to a running service for an extended duration. Typically, however, Web services are implemented according to known servlet techniques which can be disposed in an application server. Yet, servlets traditionally have been used mostly to perform short lived operations over the course only of a minute or two minutes.
In this regard, when an application server intends upon destroying a servlet, for example when a Web application is to be restarted, the application server generally will assume that a servlet responding to a request only will require a minute or so to complete a pending request before the application can destroy the servlet. While the foregoing assumption may be appropriate for a typical servlet implementing a transient Web service, the same will not hold true a long running Web service. In particular, it is known that Web services can be suspended and resumed in an ad hoc fashion in the ordinary course of distributed computing-particularly in the context of a business-to-business (B2B) transaction. Furthermore, in the event of a crash of an application server hosting a long running Web service, recovery has proven problematic.