An essential function in many of today's data processing systems has been the dissemination of information from servers to clients via a computer network. In one class of such systems, information is continually sent from servers to a large number of clients. One example of such a system is object pushing on the World Wide Web (WWW or web). Another example is data replication in a distributed database system such as Lotus Notes.
Traditionally, object retrieval on the web is based on pull technology. In this approach, a web user retrieves a web object by clicking an icon or a hyperlink through a web browser, which then establishes a network connection to a web content provider and proceeds to download and display the requested object. If the requested information is retrieved through a slow network, a noticeable latency may occur at the user end. To avoid the long wait for pulling the requested documents, an alternative is to have the server push the information to the users based on pre-specified user preferences or profiles as soon as relevant information becomes available. The users therefore receive the requested information without having to wait. Currently, most push technologies are based on background pull where a software application, executing on behalf of the user, periodically pulls the requested objects in the background.
In a distributed database system such as Lotus Notes, server databases are used to store the complete original data, whereas each client is database can maintain a duplicate subset of the server data. It is important that the contents in the client databases reflect their corresponding subsets in the server databases as accurately as possible. To achieve this, a client database periodically invokes a data replication process which connects to the server and retrieves any new information from the server databases.
In both applications (object pushing in WWW and data replication in distributed systems), as well as other systems that require data be continually sent from the servers to the clients, an important consideration is when and how often the client contents are updated. Ideally, one would like the client contents to be updated whenever their corresponding server data changes. However, this is impractical as frequent updates from a large number of clients may demand a very high network bandwidth capability not available in most organizations that run the relevant systems such as object pushing on web or data replication in distributed databases. In practice, most of these systems adopt a default periodical update mechanism in which each client sets beforehand fixed update schedules, one for each server it subscribed to. In addition, many of these systems also provide a demand-driven update mechanism such that a client can immediately request an update from a certain server if an urgent need arises.
While the pure demand-driven update scheme can be too costly in terms of bandwidth usage, the regularly scheduled updates provide flexibility in preserving bandwidth. However, it may be important for clients to set an appropriate update frequency for each server to which they are subscribed. If the frequency is too high, network bandwidth may be overflowed with the update traffic; if the frequency is too low, the information maintained by the clients may become too outdated. In the case of object push in WWW, it has been found that users tend to inadequately specify their preferences for updates with high frequencies such that many corporate gateways are often flooded with push traffic.
To alleviate this push overflow problem, push product vendors have developed proprietary proxy server software. In general, these proxy servers cache recently retrieved push objects. For each push request, these proxy servers desirably search their cache for the requested objects. If an object is found in the cache, that object is sent back to the user who made the request. If an object is not found in cache, or if the found object is considered too old, these proxy servers may relay a background pull to the original content provider to retrieve the requested object via corporate gateways. This approach can improve the gateway traffic because some client requests will involve only the retrieval of information from the proxy server's cache, and the number of cross-gateway update requests will decrease as a result.
In this proxy approach, the proxy updates have replaced the client updates in direct contact with the servers, and it is the proxy's responsibility to keep the contents stored in their caches up to date in order to reflect the new changes from the servers. When more corporate users are subscribing to the increasing number of channels that publish push objects (as is the current trend), the proxy-based update traffic can still flood the gateways if it does not take into consideration the gateway traffic condition. The same analogy can also be applied to the problem of periodical data replication beyond local gateways in a distributed database system.
Another problem with the current scheduled update approach is that once a schedule is set, the updates follow the same frequency pattern until a new schedule is manually set at a later time. In many occasions, the interest in accessing the latest information from a channel changes over time and the current approach cannot adapt to these dynamic changes. For example, a sudden stock plunge could generate a tremendous number of instant interest within a finance-related organization. Many opportunities may be lost if its proxy server is not adaptive enough to provide more updated information than previously scheduled.