1. Field of the Invention
The present invention relates to computer networks, and more specifically to a framework for scalable resource discovery and dynamic reconfiguration in distributed computer networks.
2. Description of Related Art
Computer networks such as the Internet allow users to share resources such as files and hardware. The expansion of the Internet and the adoption of standards for the World Wide Web have made the viewing and downloading of files by a user almost effortless. The user need not know any programming languages. By simply running an Internet browser, the user only needs to point and click to view and download desired files. The availability of such programs allows for easy collaboration and file sharing among like-minded individuals separated by great distances over a distributed computer network, which can literally span the entire globe.
Conventionally, a distributed computer network is set up to have a client/server framework. In particular, each user is a client that can access a server node over the network and, with the proper authorization, publish files to the server node. Once a file is published to the server node, other clients on the network can access the server node to view or download the file. Additionally, the server node can allow a client to automatically send a file to another client that is reachable over the network. The client simply sends the file to the server node along with information identifying the desired recipient, and the server node sends the file on to the corresponding client. The server node can also be used to allow the clients to share hardware resources such as a printer.
With such a client/server framework, the server node is charged with providing security. For example, the server node must insure that only authorized clients can use the network resources (e.g., download files), and that only proper files are published. Additionally, the server node represents a single point of failure. Thus, in any client/server environment in which reliability is required, the server node must be of industrial strength and have redundant systems to prevent system shutdowns and data loss. Further, because all client-to-client resource transfers pass through the server node, the adding of another client to the network puts an additional burden on the server node and degrades network performance.
In such a client/server framework, the clients have little privacy. Typically, the server node requires authentication before allowing a client to access network resources. Once the client has provided authentication credentials, the server node can easily log all of the network activity of the client. For example, the server node could keep a log of all files uploaded and downloaded by the client. Even if access by unauthenticated clients is allowed, the server node can use any of various unique identification techniques to track client activity over time. For example, the server node can place a unique cookie on the client and later use the cookie to identify the client each time it accesses the server node.
One solution to some of the drawbacks of the conventional client/server framework is provided by a “viral” network. In such a network, a user node connects to one or more known hosts that are participating in a highly interconnected virtual network. Then, the user node itself becomes a host node that can respond to requests for resources and available hosts. Each user in the network forwards resource requests to all known neighboring nodes, so as to potentially propagate each request throughout the entire network. For example, the Gnutella system employs such a viral network framework. Gnutella has a published network protocol and provides users a client/server application (available at gnutella.wego.com) that allows each user to act as a host node in a file sharing network. The Gnutella system can be used to securely distribute commercial content that is protected by encryption and licensing.
Viral networks are based on peer-to-peer communication. Peer-to-peer is a communications model in which each party has similar capabilities and either party may initiate a communication session. For example, the Gnutella application employs peer-to-peer communication to allow users to exchange files with one another over the Internet. The peer-to-peer model used in a viral network relies on each peer (i.e., user node) having knowledge of at least one of the other peers in the network. When searching for a resource such as a file, a peer sends a resource request to other known peers, which in turn pass it on to their known peers and so on to propagate the request throughout the network. A peer that has the resource and receives the request can send the resource (or a message indicating its availability) back to the requesting peer. Because such a framework offers independence from a centralized network authority (e.g., server node), users in a viral network have enhanced privacy and the single point of failure is eliminated.
FIG. 1 shows an exemplary viral network. Each node in the network represents a user that acts both as a client and host, and is connected with one or more other nodes. When a first node 210 desires a particular resource (e.g., file), the first node 210 issues a request to all known nodes 202, 204, 206, and 208, which in turn do the same. For example, the request reaches a second node 212 by being passed in succession through nodes 208, 216, and 218. If the second node 212 has the requested resource, it responds by sending an appropriate message to the first node 210 (e.g., back the same path that the request traversed). Because a node having the requested resource has been identified, the first node 210 can initiate a direct peer-to-peer connection with the second node 212 in order to download the resource. Throughout the viral network, any number of such resource requests, acknowledgments, and transfers can occur simultaneously.
While viral networks offer enhanced privacy and eliminate a single point of failure, the framework has drawbacks related to scalability. In a large, decentralized viral network, efficient resource discovery breaks down as the number of participating nodes increases. More specifically, a resource request can only propagate from node to node, and each node only propagates the request to a relatively small number of other nodes. To control network traffic and prevent unreasonable response times, a practical system must employ a “time-to-live” or some limit on the number of times a request can be forwarded (i.e., a maximum number of peer hops). This effectively disconnects any two nodes or groups of nodes that are separated by a path that would require a request to propagate through an unreasonably large number of intermediary nodes. Further, any such limit on request propagation makes it impossible to perform an exhaustive search for a resource, because such a search would require the request to be propagated to all of the nodes in the network.
Additionally, there has recently been proposed a content-based publish-subscribe messaging infrastructure that utilizes an information flow graph. For example, the Gryphon system (described at www.research.ibm.com/gryphon) has been developed by the assignee of the present invention. This system provides a content-based subscription service and performs message brokering by merging the features of distributed publish/subscribe communications and database technology. At the core of the Gryphon system is an information flow graph that specifies the selective delivery of events, the transformation of events, and the generation of new events.
FIG. 2 shows an exemplary content-based publish-subscribe messaging infrastructure that utilizes information flow graphs. In this system, stocks trades derived from two information sources NYSE and NASDAQ are combined, transformed, filtered and delivered to subscribing clients. For example, one user 312 may subscribe to the message-brokering server 302 and request to receive all stock trades on both the NYSE and NASDAQ that have a value of over one million dollars. The message broker 302 receives raw stock trade information such as price and volume from the NYSE 324 and NASDAQ 326.
Based on the information request of the user 312, the server 302 merges the stock trade information from the two sources, transforms the raw price and volume information into value information for each trade, and then filters the derived values to produce the subset of trades that are valued at over one million dollars. In a similar manner, each subscribing user (e.g., nodes 304, 306, and 308) specifies its own criteria, and the message-brokering server 302 performs information selection, transformation, filtering, and delivery in order to provide each user with the requested information.
While the publish-subscribe messaging infrastructure of FIG. 2 provides good scalability for a messaging system with a large number of users, as in the conventional client/server framework the users have little privacy. All users must identify themselves when subscribing to the system and all information is delivered to the user through the centralized server. Thus, the centralized server can easily maintain a log of all users of the system and the exact information that each desires and receives. The centralized message-brokering server also represents a single point of failure for the system.