Various methods and systems for synchronization of a cluster of devices (sometimes referred to as a cluster network or cluster system) and in particular for predictive synchronization of a cluster of multiple interconnected firewall devices are possible.
If a process is performed on a data packet and the process requires more time than the latency between packets in a data stream, then a bottleneck may occur, backing up data traffic and slowing data communication. One solution to such a bottleneck is to perform the function using a cluster of multiple devices sometimes referred to as a clustered network. In the cluster, a plurality of member devices simultaneously performs a process on a plurality of packets. As a packet arrives, a distributor directs the packet to an individual member device for processing (a single system may include multiple distributors). In sum, using parallel processing, the cluster of devices processes a large number of packets without slowing the data stream.
Firewall devices are often deployed in a clustered system. A firewall device inspects communication flows entering or leaving a trusted network and filters out unauthorized packets of data. For example, one popular firewall policy allows “solicited” Transmission Control Protocol (TCP) connections initiated from the protected network, but denies TCP “unsolicited” connections initiated from outside (e.g. the Internet). Another popular protocol is User Datagram Protocol (UDP) that allows data to enter the protected network when solicited by a “SYN” packet. Both UDP and. TCP are stateful protocols in which determining whether a data packet is a legitimate reply to a request from a member of the trusted network depends on state information. State information about a session or connection may be established in a firewall device when the first data packet sent from the trusted network initiating the connection is processed. Often state information used to identify a legitimate packet is associated with the packet header [for example all or part of the full TCP state, including source and destination addresses, Internet protocol (IP) addresses, ports and sequence numbers]. Different firewall implementations may have different header information.
When all communication is handled by a single firewall device, state information derived from the request packet may be stored locally to the firewall device for the lifetime of the session.
Often a single device cannot keep up with all the communication causing a communication bottleneck. One solution is to use a cluster of multiple firewall devices. Each device handles a portion of the communication traffic. In the firewall cluster, it is possible that a request packet may be handled by a first member firewall device and an associated response packet may be handled by a second member device. In order for the second member device to handle the response packet it must have access to state information stored in the first member firewall device, which handled the request packet. Thus, some firewall clusters share state information globally, for example by multicast broadcasting of state information to many or all of the member firewall devices. Global state information-sharing is complicated and does not scale well when the number of firewall devices in a cluster rises. Because many network connections are “short-lived,” processing power of firewall devices is wasted synchronizing state information.
A solution is to have a distributor that sends all packets of an established connection to the “home” firewall device in which the state information is kept locally. A conventional distributor sends a packet member to a cluster member designated by a deterministic hash function of the IP header information. This methodology allows a simple stateless distributor to consistently send packets of a single session having similar IP headers to a single cluster member. If all packets of each session are always sent to the same member firewall device, then there may be no need to share state information.
There are situations (for example an asynchronous session as illustrated herein below) where a stateless distributor based on a simple hash function may fail to direct all packets of a session to the same cluster member. Thus, a conventional firewall cluster with a stateless distributor may require sharing of state information in order to handle asynchronous sessions.
FIG. 1 illustrates an example of a cluster 10 of firewall devices handling an asynchronous session. A distributor 16a is employed to load-balance traffic from the trusted network and a distributor 16b is employed to load-balance traffic from the Internet.
Distributors 16a and 16b evenly distribute packets among three members 18a, 18b, and 18c of cluster 10 for security processing. For example, a request packet is sent 20a by a client 12 on a trusted network through distributer 16a. The packet is intended for a server 14 on the Internet.
Distributor 16a performs a deterministic hash of the relevant packet fields (e.g., the source address src-IP, and destination address dst-IP) to designate member 18a to receive the request packet and distributor 16a sends 20b the request packet to member 18a. Due to security considerations, member 18a changes the packet header (for example by performing NAT [network address translation] or VPN encryption. Thus, the packet fields are now changed to a translated source address and the original destination address (trans-IP, dst-IP). Member 18a sends 20c the amended request packet to distributor 16b and distributor 16b sends 20d the amended request packet to server 14.
Server 14 sends 20e a reply packet back to the translated source addresses of the modified header of the request packet (i.e., the reply packet has as the source and destination address, dst-IP and trans-IP respectively). The reply packet arrives to distributor 16b, and distributor 16b performs a hash using the same deterministic algorithm as distributor 16a. Because the IP addresses of the reply packet differ from the IP addresses of the original request packet, distributor 16b designates member 18c to receive the reply packet (and not member 18a, which received the original request packet) and sends 20f the reply packet to member 18c. 
Thus, an asymmetric session has been created through firewall cluster 10 (the reply packet is handled by member 18c, which did not handle the request packet). Member 18c requires state information (for example member 18c needs to know the IP header information of the request packet) in order to determine validity of the reply packet.
In general, in order to synchronize asymmetric flows between members, it is necessary to provide the required information for security processing (TCP state, TCP sequencing) In the conservative synchronization approach, information on all sessions through member 18a is relayed 25 by a multicast broadcast to all other members 18b-c in order that every member 18a-c can handle the reply stream (it is necessary to send information to all members 18b-c because it is not known, a priori, to which member 18a-c the reply stream will be directed by distributor 16b). Based on information from the request packet, member 18c verifies and sends 20g the reply packet to distributor 16a, and distributor 16a sends 20h the reply packet to client 12.
As the cluster grows (having a large number of members to handle a large quantity of traffic quickly) communicating and duplicate storing of state information amongst a large number of cluster members will take up significant system resources. With increasing cluster size the increasing need for communication and data storage among members will significantly hurt performance scalability of the clustered system.
Various solutions have been proposed to solve the scalability problem of clustered systems handling asymmetric sessions. U.S. Pat. No. 7,107,609 to Cheng et al. teaches use of a multicast query to find the home member that owns a connection. The need for a query and reply every time a cluster member receives an unknown packet from the Internet may slow system performance and make the system prone to DoS attacks, where a large number of unknown packets overwhelm the system's ability to respond.
U.S. Pat. No. 7,401,355 to Supnik et al. teaches use of a single, stateful smart distributor for flow both from the Internet to the trusted network and from the trusted network to the Internet. Since the single distributor handles all flows, the distributor is aware of changes in IP address and always directs return traffic to the home cluster member that owns a session. As scale increases, the requirement of a single distributor that can statefully handle all flows may significantly increase the cost of the clustered system. Furthermore, the distributor may itself become a bottleneck, limiting system scalability and performance.
U.S. Pat. No. 7,613,822 to Joy, et al. describes a system where members of the cluster communicate with the distributor in order to send traffic to the home member of a session. The requirement of communication between the distributor and members makes necessary the use of a sophisticated custom distributor and the requirement that all cluster members communicate with the distributor may limit performance as the cluster scale increases.
Even in previous art clustered systems having a stateful distributor, which directs all packets of a particular session consistently to the same member, multiple copies of session state information are often sent to other members of the cluster in order to provide for backup. For example, if a home member of the cluster fails during a session, the distributor will send continuation of the session to a backup member. In previous art, the home member may not know which cluster member will be used as a backup for a given session. Therefore the home member may relay state information of the session to many other members so that each member is prepared to receive the session continuation in case of failure of the home member. The need for each member of a cluster to constantly update many other members of the cluster of the state of one or more sessions in order that each of the many members can serve as a backup member for any of the sessions results in major scaling problems.
There is thus a widely recognized need and it would be highly advantageous to have a cluster of stateful member devices that is scalable (can be enlarged without undue increase in communication between members) and that can handle backup and asymmetric sessions.