The present invention relates to transmission of data in a network environment. More specifically, the present invention relates to methods and apparatus redirecting network traffic. Still more specifically, techniques are described herein for redirecting packet flows from a device that does not own the flows.
Generally speaking, when a client platform communicates with some remote server, whether via the Internet or an intranet, it crafts a data packet which defines a TCP connection between the two hosts, i.e., the client platform and the destination server. More specifically, the data packet has headers which include the destination IP address, the destination port, the source IP address, the source port, and the protocol type. The destination IP address might be the address of a well known World Wide Web (WWW) search engine such as, for example, Yahoo, in which case, the protocol would be TCP and the destination port would be port 80, a well known port for http and the WWW. The source IP address would, of course, be the IP address for the client platform and the source port would be one of the TCP ports selected by the client. These five pieces of information define the TCP connection.
Given the increase of traffic on the World Wide Web and the growing bandwidth demands of ever more sophisticated multimedia content, there has been constant pressure to find more efficient ways to service data requests than opening direct TCP connections between a requesting client and the primary repository for the desired data. Interestingly, one technique for increasing the efficiency with which data requests are serviced came about as the result of the development of network firewalls in response to security concerns. In the early development of such security measures, proxy servers were employed as firewalls to protect networks and their client machines from corruption by undesirable content and unauthorized access from the outside world. Proxy servers were originally based on Unix machines because that was the prevalent technology at the time. This model was generalized with the advent of SOCKS which was essentially a daemon on a Unix machine. Software on a client platform on the network protected by the firewall was specially configured to communicate with the resident demon which then made the connection to a destination platform at the client""s request. The demon then passed information back and forth between the client and destination platforms acting as an intermediary or xe2x80x9cproxyxe2x80x9d.
Not only did this model provide the desired protection for the client""s network, it gave the entire network the IP address of the proxy server, therefore simplifying the problem of addressing of data packets to an increasing number of users. Moreover, because of the storage capability of the proxy server, information retrieved from remote servers could be stored rather than simply passed through to the requesting platform. This storage capability was quickly recognized as a means by which access to the World Wide Web could be accelerated. That is, by storing frequently requested data, subsequent requests for the same data could be serviced without having to retrieve the requested data from its original remote source. Currently, most Internet service providers (ISPs) accelerate access to their web sites using proxy servers.
Unfortunately, interaction with such proxy servers is not transparent, requiring each end user to select the appropriate proxy configuration in his or her browser to allow the browser to communicate with the proxy server. For the large ISPs with millions of customers there is significant overhead associated with handling tech support calls from customers who have no idea what a proxy configuration is. Additional overhead is associated with the fact that different proxy configurations must be provided for different customer operating systems. The considerable economic expense represented by this overhead offsets the benefits derived from providing accelerated access to the World Wide Web. Another problem arises as the number of WWW users increases. That is, as the number of customers for each ISP increases, the number of proxy servers required to service the growing customer base also increases. This, in turn, presents the problem of allocating packet traffic among multiple proxy servers.
Network caching represents an improvement over the proxy server model which is transparent to end users, high performance, and fault tolerant. By altering the operating system code of an existing router, the router is enabled to recognize and redirect data traffic having particular characteristics such as, for example, a particular protocol intended for a specified port (e.g., TCP with port 80), to one or more network caches connected to the router via an interface having sufficient bandwidth. If there are multiple caches connected to the cache-enabled router, the router selects from among the available caches for a particular request based on a load balancing mechanism.
The network cache to which the request is re-routed xe2x80x9cspoofsxe2x80x9d the requested destination platform and accepts the request on its behalf via a standard TCP connection established by the cache-enabled router. If the requested information is already stored in the cache it is transmitted to the requesting platform with a header indicating its source as the destination platform. If the requested information is not in .the cache, the cache opens a direct TCP connection with the destination platform, downloads the information, stores it for future use, and transmits it to the requesting platform. All of this is transparent to the user at the requesting platform which operates exactly as if it were communicating with the destination platform. Thus, the need for configuring the requesting platform to suit a particular proxy configuration is eliminated along with the associated overhead. An example of such a network caching technique is embodied in the Web Content Caching Protocol (WCCP) provided by Cisco Systems, Inc., a specific embodiment of which is described in copending, commonly assigned, U.S. patent application Ser. No. 08/946,867 for METHOD AND APPARATUS FOR FACILITATING NETWORK DATA TRANSMISSIONS filed Oct. 8, 1997, the entirety of which is incorporated herein by reference for all purposes.
As a cache system starts up, traffic that is redirected to the cache system may become disrupted under certain conditions. For example, if a new flow is established while the cache system is shut down, this new flow will not be recognized by the cache system when it reconnects or starts up. (A flow is generally defined as a stream of packets or traffic that originates from a same source and is directed to a same destination.) In other words, the cache system receives the packets in mid-flow after the flow has been established with some other destination. Since the flow has not been established with the cache system, it is not recognized as belonging to the cache system. This unrecognized flow will be reset by the cache system under current TCP procedures. Thus, any flows that are established outside of the cache system, i.e. with the intended destination, will be automatically reset when the cache system starts up and such flows are redirected to the cache system. Of course, any flow disconnects are undesirable. As the number of clients that access a given cache system at one time increase, the incidence of traffic disruptions caused by a cache system starting up during mid-flow also increase. Thus, the cache system may affect a significantly large amount of traffic during start up. Therefore, there is a need for improving a cache system""s start up procedures such that traffic is not disrupted by the cache system.
Accordingly, the present invention provides an apparatus and method for intelligently determining whether a cache system is going to process an incoming packet flow or redirect it to its original intended destination. The originally intended destination is the destination that is xe2x80x9cspoofedxe2x80x9d by the cache system if it decides to process a flow. If a cache system is shut down and then restarted, a client may have established a flow with another destination (i.e., the original intended destination). Thus, a flow may be received into the cache system during mid-flow. Rather than accept such a non-established packet flow, the cache system determines that it does not own the packet flow and redirects it to the original intended destination. In one embodiment, the cache system simply checks whether the packet flow is listed within a monitor flow table. If the packet flow was initially established with the cache system, the flow will be identified within the monitor flow table. In one embodiment, a new packet flow is identified within the table when the new packet flow is established at the cache system. Similarly, the cache system may check the monitor flow table prior to shutting down. Prior to shutting down, the cache system may respond to flow completion requests from flows that are owned by the cache system. Otherwise, flow completion requests from a flow that is not owned by the cache system are redirected to the flow""s original intended destination.
In one embodiment, a method for controlling packet flow to a cache system is disclosed. A packet flow intended for a first destination is received into the cache system. When the packet flow indicates the start of the packet flow or when the packet flow is identified as being owned by the cache system, the packet flow is processed within the cache system. When the packet flow does not indicate the start of the packet flow and the packet flow is not identified as being owned by the cache system, the packet flow is directed back to the first destination.
In another aspect, the invention pertains to a computer system operable to control a packet how directed to the computer system. The computer system includes a memory and a processor coupled to the memory. The memory and the processor are adapted to provide the above described methods. In another embodiment, the invention pertains to a computer program product for controlling packet flow to a cache system. The computer program product includes at least one computer readable medium and computer program instructions stored within the computer readable product configured to cause a processing device to perform the above described methods.
In another embodiment, flows are redirected for cache nodes that are being assigned new or different buckets. Bucket assignments generally indicate which flows go to which cache node of a particular cache cluster. For example, when a new cache node joins a particular cache cluster, buckets may be reassigned to accommodate such new node. Flows that are assigned to these candidates for displacement are redirected to their original intended destination. Thus, the number of flows to a candidate node eventually falls to zero, whereby the buckets may then be moved or reassigned.
The present invention provides a number of advantages. For example, since a cache system is configured to only process packet flows that are owned by the cache system, flow disruption may be significantly decreased during startup of the cache system. The cache system doesn""t process flows that it doesn""t own. Thus, it is unlikely that the flows processed by the cache system will be unrecognized by the cache system and subsequently terminated. Thus, after startup, the cache system intercepts traffic slowly.
These and other features and advantages of the present invention will be presented in more detail in the following specification of the invention and the accompanying figures which illustrate by way of example the principles of the invention.