Field
Embodiments of the present invention generally relate to the field of computer networks. In particular, various embodiments relate to methods and systems for an improved cluster-based network architecture that enable efficient load balancing and failover protection between units of a cluster.
Description of the Related Art
In the field of distributed computing, two or more computing resources are generally used to perform various tasks such as traffic management, synchronization, load balancing, and failover, among other such tasks. These grouped resources generally form a computing environment and are often referred to as “computing clusters” or simply “clusters”. A cluster typically includes computers or processors, networks or communication links for data transfer, databases, and other devices such as routers or gateways that are configured to allow load balancing, high availability (HA), better connectivity, high performance, and failover procedures to ensure smooth flow of traffic across the network.
A HA cluster can include two or more units, commonly referred to as cluster units, which are configured to enable high availability operation, protect against one single point of failure, and recognize or detect faults in a system's hardware or software application. Such cluster units can either be of same model and same hardware configuration (for instance, same AMC modules installed in same slots, same number of hard disks and so on) and running in the same operating mode (NAT/Route mode or Transparent mode) or can have different configurations and device settings.
In a typical operation of a high-availability cluster, on startup, after cluster units have been configured with same HA settings, cluster units find other cluster units for HA operation and negotiate with each other to create a cluster. During cluster operation, cluster units, running on a common protocol, share communication and synchronization information among themselves, where the cluster units communicate with each other using network interfaces present in each cluster unit. Common cluster units include standalone gateways, routers, switches, and the like.
High availability (HA) cluster is generally operated in two modes: Active-passive HA (failover protection) and Active-active HA (load balancing and failover protection). Active-passive HA cluster provides standby failover protection and includes a primary or master cluster unit that processes communication sessions and one or more subordinate or slave cluster units. Subordinate or slave cluster units are connected to the network and to primary or master cluster unit but do not process communication sessions, and instead run in a standby state. In standby state, primary or master cluster unit stores configuration data and routing data in subordinate or slave cluster unit to synchronize with subordinate slave cluster units. If master cluster unit fails in active-passive HA cluster, one of subordinate or slave cluster unit immediately takes its place. Active-passive HA cluster also provides transparent link failover among cluster units.
Active-active HA cluster, on the other hand, includes a primary or master unit and one or more subordinate or slave cluster units, wherein master cluster unit receives all communication sessions and load balances them between itself and all subordinate or slave cluster units. In an active-active cluster, subordinate units are also active since they also process sessions from network devices connected to HA cluster.
FIG. 1 illustrates an exemplary prior art cluster-based network architecture 100. Network 100 includes a plurality of computing devices 102a, 102b, 102c, . . . 102n, collectively referred to as computing devices 102 hereinafter, where such computing devices 102 can include one or more of a mobile device, smart device, tablet PC, web-enabled device, among other such devices that can be operatively coupled to a network. Computing devices 102 are connected to a local area network, wherein the local area network can be a wired or wireless network. Local area network (LAN) can be connected to an internal switch 104 that is configured to handle traffic entering and/or leaving the LAN. In operation, traffic from internal switch 104 is sent to a HA cluster, wherein the HA cluster includes multiple cluster units 106 having at least one master cluster unit 106a and a slave cluster 106b. The master cluster unit 106a processes communication sessions received from computing devices 102 and the slave cluster unit 106b stores session information of the master cluster 106a. HA cluster is connected to an external switch 108, which in turn is connected to a router 110 that controls all traffic entering and leaving the HA cluster. Router 110 is connected to Internet 112 or an external network 112 to enable computing device 102 to access the Internet 112 or the external network 112.
When HA cluster of FIG. 1 is operated in active-passive HA cluster mode, master cluster unit 106a processes all communication sessions from computing devices 102 and synchronizes slave cluster unit 106b to store session information processed by master cluster unit 106a. Master cluster unit 106a receives data or traffic from computing devices 102 through internal switch 104 and passes the data through to external switch 108 after processing the data. Information regarding processed data is stored in slave cluster unit 106b. When master cluster unit 106a fails, data processing and transmission comes to a halt and slave cluster unit 106b is declared as a new master cluster unit and the newly assigned master cluster unit 106b starts processing traffic from or to the computing devices 102.
On the other hand, when HA cluster, as described above in FIG. 1, is implemented in an active-active HA cluster mode, master cluster unit 106a operates in its normal master mode to process traffic from one or more computing devices 102 and assigns one or more of other computing devices 102 to slave cluster unit 106b and can further store session information in slave cluster unit 106b. In this mode therefore, slave cluster unit 106b also processes one or more computing devices 102 assigned to it by master cluster unit 106a. In operation, when master cluster unit 106a fails, one of the slave cluster units 106b is selected as a new master cluster unit and starts processing the traffic and other configuration information from computing devices 102 that were assigned to earlier master unit 106a. 
Existing cluster architectures require a master-slave configuration, which, apart from creating other inefficiencies, demands all network devices to be connected to the master cluster unit, thereby increasing the load on the master unit and not allowing balancing of load on all cluster units. For instance, in traditional active-active configuration, all traffic is first sent to master cluster unit and then the master unit redirects the packets to slave cluster units if the corresponding network session is on the slave cluster units. Further, in an active-passive HA cluster mode, as computing devices in a network are connected to a single master cluster unit that manages data traffic, when the master cluster unit fails, there exists no other master cluster unit, and a delay is created before the slave cluster unit is assigned the role of a master cluster unit, leading to a potential loss of data or other relevant configuration information. Active-active HA cluster mode method also faces similar problems as of active-passive HA cluster mode method as the process of selecting and assigning new primary cluster unit takes time and creates load balancing issues in the cluster. Furthermore, existing architectures require slave cluster units to re-learn IP/MAC addresses after every failover, which again creates inefficiencies and delay.
In view of existing mechanisms of failover protection and load balancing in high-availability clusters having multiple cluster units, there exists a need for methods and systems that can provide failover protection and load balancing in high-availability clusters.