1. Field of the Invention
The present invention relates to a computer system, and deals more particularly with a method, system, and computer program product for applicant-independent, automatic synchronization of data between a replicated version and a back-end data store version which may or may not have the same format. Queues are used for scheduling refreshes of read-access objects and updates of write-access objects, where the actual processing will occur when the client device connects to the back-end data source.
2. Description of the Related Art
Business and consumer use of distributed computing, also commonly referred to as network computing, has gained tremendous popularity in recent years. In this computing model, the data and/or programs to be used to perform a particular computing task typically reside on (i.e. are xe2x80x9cdistributedxe2x80x9d among) more than one computer, where these multiple computers are connected by a network of some type. The Internet, and the part of the Internet known as the World Wide Web (hereinafter, xe2x80x9cWebxe2x80x9d), are well-known examples of this type of environment wherein the multiple computers are connected using a public network. Other types of network environments in which distributed computing may be used include intranets, which are typically private networks accessible to a restricted set of users (such as employees of a corporation), and extranets (e.g., a corporate network which is accessible to other users than just the employees of the company which owns and/or manages the network, such as the company""s business partners).
The client/server model is the most commonly-used computing model in today""s distributed computing environments. In this model, a computing device operating in a client role requests some service, such as delivery of stored information, from another computing device operating in a server role. The software operating at one particular client device may make use of applications and data that are stored on one or more server computers which are located literally anywhere in the world. Similarly, an application program operating at a server may provide services to clients which are located throughout the world. A common example of client/server computing is use of a Web browser, which functions in the client role, to send requests for Web pages or Web documents-to a Web server. Another popular model for network computing is known as the xe2x80x9cpeer-to-peerxe2x80x9d model, where the requester of information and the provider of that information operate as peers.
Whereas the HyperText Transfer Protocol (HTTP) is the communications protocol typically used for communication between a client and a server in the client/server model used in the Web, a protocol such as Advanced Program-to-Program Communication (APPC) developed by IBM is typically used for communication in a peer-to-peer model.
Application integration middleware technology has been developed for use in these distributed computing environments to enable application programs to efficiently and conveniently interact with legacy host applications and/or legacy host data stored in a back-end data store (such as a database, directory, or other data storage repository). For the legacy host environment, for example, software components written as objects are being developed to access legacy host data, where these objects enable replacing procedural language software developed for prior computing architectures (such as the 3270 data stream architecture). These objects are then executed by the middleware. Examples of middleware technology include the Host Integration product (and its Host On-Demand and Host Publisher components) and the WebSphere(trademark) product, both from IBM, which can be used to access back-end data sources including CICS(copyright) (Customer Information Control System) host applications and JDBC (Java(trademark) Database Connectivity) databases. (xe2x80x9cCICSxe2x80x9d is a registered trademark of IBM, xe2x80x9cWebSpherexe2x80x9d is a trademark of IBM, and xe2x80x9cJavaxe2x80x9d is a trademark of Sun Microsystems, Inc.) Application middleware of this type serves as a surrogate for the back-end data source, and provides a consistent interface to its callers. It maintains connections to one or more of the back-end data sources, enabling quick and efficient access to data when needed by an executing application. That is, when a client application (or requesting application, in a peer-to-peer model) requests information or processing, the middleware starts a process to interact with the back-end data source across a network connection to get the information needed by the caller. In this interaction with the back-end data source, the middleware typically functions in the client role, as the surrogate of the requesting client which initiated the request. (Note: the term xe2x80x9cback-end data sourcexe2x80x9d, as used herein, refers to data stores as well as to applications which create and/or return data to a requester. The term xe2x80x9cback-endxe2x80x9d as used herein refers to legacy host systems as well as to database systems.)
Many examples of this computing approach exist. As one example, WebSphere applications developed using the Enterprise Access Builder (xe2x80x9cEABxe2x80x9d) component of IBM""s VisualAge(copyright) for Java product include back-end data source connector objects which are used to get back-end source information from EAB-created JavaBeans(trademark). (xe2x80x9cVisualAgexe2x80x9d is a registered trademark of IBM, and xe2x80x9cJavaBeansxe2x80x9d is a trademark of Sun Microsystems, Inc.) As another example, Host Publisher applications may operate to get back-end source information from the xe2x80x9cIntegration Objectsxe2x80x9d which are created using its Design Studio component. (Integration Objects are application-specific encapsulations of legacy host access code,. or database access code, specified as reusable JavaBeans. These Integration Objects are designed for enabling remote client access to the back-end data source.) In a more general sense, any middleware application can use a Host Access Session bean with a Macro bean to get back-end source information which is described using a Host Access macro script. (A xe2x80x9cHost Access Session beanxe2x80x9d is a bean created for establishing a session that will be used for accessing a back-end data source. A xe2x80x9cMacro beanxe2x80x9d is a bean which, when executed, plays out the commands of a macro. Instances of these Host Access Session and Macro beans may be created using classes provided by IBM""s Host On-Demand product. A xe2x80x9cHost Access macro scriptxe2x80x9d is a recording of macro commands that may be used to access data via a host session. For example, a macro may be used to record the log-on sequence used to log on to a host application. This sequence typically includes actions such as establishing a network connection to a host application; prompting the user for his or her identification and password; and then transmitting the information entered by the user to the host application over the network connection. The macro transforms the sequence into commands. When using a Macro bean, the log-on process occurs as the macro commands are executed by the bean. The Macro bean insulates the legacy host code from the object-oriented environment of the requesting client: the legacy code interacts with the macro commands as if it was interacting directly with a user whose device is using, for example, a 3270 protocol for which the legacy code was originally designed. The client never sees the legacy code. Additional host access macro scripts may be created to perform other host interaction sequences.)
Use of application middleware in a distributed computing environment provides a number of advantages, as has been described briefly above and as will be understood by one familiar with the art. However, there are several shortcomings in this approach as it exists in the prior art. One problem of the prior art is in the area of system performance; another is in programming complexity. The performance concern is due to the requirement that the middleware needs to be connected to the back-end system, and to interact in real time for the information requested by its callers. This requires a considerable amount of computing and networking resources.
Furthermore, there may be repeated requests for retrieval of the same information. If repetitively requested information tends to be somewhat static in nature, it is an inefficient waste of system resources to interact with the back-end system each time it is requested, only to retrieve the same result that was obtained with a prior request. In addition, an application program may generate updates to a back-end data store which are not time-critical. An example of this type of application is one that generates low-priority processing requests such as daily purchase orders, where it might not be necessary to process the orders immediately: rather, delayed execution could process the orders and send confirmations messages to the initiators. Many other examples of applications which generate updates that do not require immediate, real-time processing exist. For such applications, it may be preferable for the updates to be accumulated over time and processed when the receiving computing system is lightly loaded, enabling the system""s scarce resources to yield to higher-priority tasks in the interim. The prior art does not provide general solutions for optimizing resource utilizations in this manner. Instead, a developer must manually code logic to optimize resource usage, in view of the needs of a particular application, leading to complex (and therefore error-prone) programming requirements. The related U.S. application Ser. No. 09/518,474 entitled xe2x80x9cCaching Dynamic Contentxe2x80x9d (referred to hereinafter as the xe2x80x9cfirst related inventionxe2x80x9d) defines a technique for caching objects (which may be JavaBeans) to avoid the system overhead of repetitive retrieval of information which has not changed. While the technique disclosed therein provides an efficient way to deal with read access to objects, it does not address write access.
An additional problem of the prior art occurs when applications execute in a disconnected mode. xe2x80x9cDisconnected modexe2x80x9d, as used herein, refers to an execution mode where a client device on which an application is executing might not be currently connected to the code which performs the actual update of the affected back-end data store, and where data from the back-end system has been replicated such that the application on the client device can access this replicated copy.
This execution model is common in distributed xe2x80x9cbranch officexe2x80x9d environments, where the computing devices within a branch office (or some analogous subset) of an enterprise may be connected together using a local area network (LAN) or similar network, but real-time transactions do not typically occur between those computing devices and the back-end enterprise system. Instead, a branch office network typically has a replicated copy of the data which is stored at the back-end system (where this replicated copy may be stored, e.g., at a branch office server), so that the local operations which occur within the branch operate against this local copy. At a designated processing time (for example, at some point following the end of the business day), the local copy is then brought into synchronization with the back-end system. This synchronization process of the prior art is application-specific, requiring either (1) copying of data from the local store to the back-end store, where each store has an identical format, or (2) writing application-specific code to perform a synchronization process between data stores having a dissimilar format.
The disconnected execution model may also be used where the client device is an occasionally-connected mobile computing device (also referred to as a xe2x80x9cpervasive computingxe2x80x9d device), such as a handheld computer. This type of computing device may store a local replicated copy of the data upon which its applications operate. At some point, the mobile device must connect to the back-end store so that the local copy can be synchronized with the copy from which it was replicated, similar to the approach described above for a branch office server.
The inventors know of no technique with which an arbitrary replicated data source can be automatically synchronized with a back-end data source which does not share a common file format. Client software which is developed to interact with legacy host or database access software at a back-end system is unlikely to use a storage format which is identical to that used at the back-end, thus necessitating creation of application-specific code for the synchronization process of the prior art. In particular, modern object-oriented client front-end software is one example where the file formats used for data storage will be different from that of the back-end.
Accordingly, there is a need for solving the above-described problems of inefficient, complex update access to a back-end data store and application-specific synchronization approaches for synchronizing replicated data with a back-end store.
An object of the present invention is to provide a technique whereby one data store can be automatically synchronized with another data store, even though the two stores may not share a common format.
Yet another object of the present invention is to provide this technique wherein one of the data stores is a replicated version of data used in disconnected operations, and the other data store is a back-end data store.
A further object of the present invention is to provide this technique wherein the replicated version uses object-oriented data objects for its storage format and the back-end data store uses legacy host data or database storage formats.
Another object of the present invention is to provide this technique in a generic manner such that a developer is not required to write application-specific synchronization code.
Still another object of the present invention is to provide this technique such that the synchronization process can be offloaded to a device other than the one which stored the replicated version.
Other objects and advantages of the present invention will be set forth in part in the description and in the drawings which follow and, in part, will be obvious from the description or may be learned by practice of the invention.
To achieve the foregoing objects, and in accordance with the purpose of the invention as broadly described herein, the present invention provides a computer program product, a system, and a method for automatically synchronizing data between a replicated version and a back-end data store version which may or may not have the same format. This technique comprises: storing one or more first objects as replicated read-access objects in a first cache for responding to read requests against the first objects, wherein (1) a set of input properties and values thereof is stored with or associated with each replicated read-access object and (2) refresh logic specifying how to refresh each of the replicated read-access objects is stored with or associated with the replicated read-access object or a group of replicated read-access objects; storing one or more second objects as replicated write-access objects in a second cache for responding to update requests against the second objects, wherein (1) a set of input properties is stored with or associated with each replicated write-access object and (2) update logic specifying how to update each of the replicated write-access objects is stored with or associated with the replicated write-access object or a group of replicated write-access objects; receiving read requests against one or more of the first objects; receiving update requests against one or more of the second objects; responding to the read requests using the replicated read-access objects; queuing the update requests, along with the input properties and values thereof which are to be used for performing each update request, as queued update requests on an update queue; scheduling a refresh of a selected replicated read-access object by queuing the selected replicated read-access object or a reference thereto as a queued refresh request on a refresh queue; determining that a replication is to be performed; and performing the replication to refresh the replicated read-access objects and the replicated write-access objects by processing the queued refresh requests on the refresh queue and the queued update requests on the update queue.
A separate refresh queue and a separate update queue may be created for each of one or more back-end data sources to be accessed during the replication.
A first caching policy of the refresh queue may be set to refresh the replicated read-access objects upon making a connection to a first back-end data source and a second caching policy of the update queue may be set to perform the queued update requests on the update queue upon making the connection to the first back-end data source or upon making a connection to a second back-end data source. In this case, performing the replication is preferably triggered according to one or both of the first caching policy or the second caching policy.
The technique may further comprises connecting to one or more back-end data sources prior to performing the replication, and disconnecting from the one or more back-end data sources after performing the replication.
Performing the replication may further comprise executing the refresh logic stored with or associated with selected replicated read-access objects for which the queued refresh requests are queued, and executing the update logic stored with or associated with selected replication write-access objects for which the queued update requests are queued.
Or, performing the replication may further comprise processing the queued update requests on the update queue, and processing the queued refresh requests on the refresh queue, after processing the queued update requests. Processing the queued update requests further comprises further comprises: setting the input properties of a selected replicated write-access object against which the queued update request is to be performed using the queued input property values; and executing the update logic stored with or associated with the selected replicated write-access object using the input properties and values thereof. Processing the queued refresh requests further comprises executing the refresh logic stored with or associated with selected replicated read-access objects for which the queued refresh requests are queued, thereby refreshing the selected replicated read-access objects. Performing the replication may further comprise connecting to one or more back-end data sources prior to processing the queued update requests, and disconnecting from the one or more back-end data sources after processing the queued refresh requests.
Determining that the replication is to be performed may further comprise detecting that a connection to a back-end data source has been made.
Performing the replication may further comprise offloading the replication to a remote device. The offloading preferably further comprises: packaging the refresh queue and update queue for sending to the remote device; sending the packaged queues to the remote device; receiving a response from the remote device which indicates that the offloaded replication has been performed; refreshing the replicated read-access objects, responsive to receiving the response; and purging the refresh queue and update queue, responsive to the refreshing. The packaging may further comprise creating an Extensible Markup Language (XML) representation of the refresh queue and the update queue. The received response may comprise information to use during the refreshing, and this information may be in an XML representation. The offloading may further comprise connecting to the remote device prior to sending the packaged queues, and disconnecting from the remote device after receiving the response.