The link aggregation is an aggregation of a series of physical ports, which presents a single, standard IEEE802.3 interface to the Media Access Control (MAC) Client, and plays a role in increasing the bandwidth of the physical port and the redundant standby. The link aggregation protocol is a standard protocol which controls the link aggregation, operates among the peer to peer link aggregators, determines the joining and removing of the physical port through an interaction of the protocol message, and determines whether the physical port can receive and send the data message. The standard link aggregation protocol is a centralized state machine including five protocol state machines with particular functions which operate on the physical port and connect each other.
The current LACP state machine includes the following state machines, their functions and interrelations are described as follows:
a receive (RX) state machine (Receive machine): the state machine receives the link aggregation control protocol data unit coming from the Partner (opposite end), records the information therein and uses a short timeout or a long timeout to make it overtime according to a timeout set by the LACP; the RX state machine evaluates the information from the Partner, determines whether the Actor (the local end) and the opposite end have already agreed that the exchanged protocol information is used for the port aggregating with other ports or becoming an independent port to a certain extent; if not agreed, then the RX state machine sets the Need-To-Transmit (NTT) identifier to send new protocol information to the Partner; if the protocol information of the Partner is overtime, the RX state machine installs a default parameter value for other state machines use.
A periodic state machine (Periodic Transmission machine): it determines the aggregation mode of the Actor and the Partner, and, in order to maintain the aggregation, determines whether to exchange the LACPDU periodically (if any one end or both ends are configured to be Active, then a regular LACPDU interaction happens).
A selection state machine (Selection Logic): it is responsible for selecting an Aggregator correlated with the port, and decides which Aggregator is in an active state in a plurality of Aggregators.
A MUX state machine (MUX machine): it is responsible for turning on or shutting down the collecting and distributing of the port according to the request of the present protocol information.
A sending (TX) state machine (Transmit machine): it sends the LACPDU required by other state machines or periodically.
The LACP standard state machine is a centralized state machine. FIG. 1 has described a diagram of the operation relation of the current LACP centralized state machine, wherein, the current LACP centralized state machine operates successively according to the sequence of the receive machine, the periodic transmission machine, the selection logic, the MUX machine and the transmit machine.
A distributed system is divided as the Management Processor (MP), the Routing Processor (RP), and the Network Processor (NP) according to the functions, wherein the MP mainly completes the functions such as command explanation, network management, etc., the RP mainly realizes the protocols, and the NP mainly completes receiving, sending and forwarding the message, etc. For a centralized system, the MP, the RP, and the NP are implemented on one CPU, mainly used in the low-end device. For a distributed system, the MP, the RP and the NP are placed on their independent CPUs, mainly used in the high-end device. In order to improve the performance of forwarding the message and increase the stability and reliability of forwarding the message, the NP is placed and implemented on a plurality of CPUs. In order to increase the reliability of the RP, an active-standby system is adopted, and there are two sets of RPs which operate on different CPUs in the active-standby system, wherein, they can be standby for each other.
The ways for implementing the active-standby system are divided into two kinds: the active-standby cold standby and the active-standby hot standby. It is explained hereinafter by taking the active-standby system of the RP as an example. The active-standby cold standby refers to: when the active RP is abnormal, the standby RP switches from a standby state to an active state, at this time, the switched RP re-reads the database, performs an initialization, finishes an interaction with the NP, and then switches to an operating state; in this process, the dynamic data on the RP before switching are all lost, and it needs to study again, which causes instability of the network and results in an interruption of the forwarding. The active-standby hot standby refers to: after the standby board is inserted and finishes the message exchange with the active board, and then it enters the batch synchronization state, and synchronizes the static data and the dynamic data at the active RP to the standby board; after the batch synchronization finishes, the real-time synchronization is performed between the active board and the standby board, and the RP on the active board synchronizes the changed data to the standby board in time; in this way, after an active-standby switching, most of the data of the active board are stored on the standby board, and the dynamic data are recovered timely after the active-standby switching through the data smoothing and the Graceful Restart (GR) of the protocol, which reduces the instability of the network, and also guarantees that the current forwarding is not interrupted during the active-standby switching.
The current LACP centralized state machine operates on the RP, the active-standby cold standby is implemented very easily, but it is much difficult to implement the active-standby hot standby. The active-standby hot standby will have a large number of protocols to operate the GR function in the active-standby switching process, which leads to the problems, such as, an increase of the utilization rate of the CPU where the RP locates, an message delivery delay, an increase of the probability of the delivery lost, an unpunctual timer operation, etc., and the completion time of the active-standby switching is linearly increased with the increase of the service data bulk.
In order to maintain the aggregation, the local end Actor and the opposite end Partner of the LACP must exchange the LACPDU periodically, the problems, such as, a message delivery delay and a delivery lost, an unpunctual timer operation, etc., occurred in the active-standby switching process will cause that the Partner cannot receive the LACPDU sent by the Actor within the timeout period, and then will result in a link disconnection of the physical link and an interruption of the flow forwarding. Especially, when the Partner sets a short timeout (it requires the Actor to send one LACPDU per second, and if the Partner does not receive the LACPDU within 3 seconds, then the link is broken due to the timeout), the phenomenon such as the link disconnection and the link cutout will occur much easily.
In addition, in the LACPDU message exchange process of the Actor and the Partner, the Actor, after receiving the message from the physical port of the line card (NP), delivers the message to the management processor (RP)-active, and hands it to the receive machine for processing, and the receive machine, the periodic transmission machine, the selection logic, the MUX state machine, and the transmit machine are called sequentially; when the message is sent, it needs to be delivered by the management processor (RP)-active to the corresponding line card (NP) for sending. The two message processes between the management processor (RP)-active and the line card increases the delay of the LACPDU message exchange, reduces the performance of the redundancy switchover of the physical link, and consumes the valuable CPU resources at the same time.
In sum, that the distributed communication device operates the LACP centralized state machine will have some problems, such as inefficiency, unreliablity, difficulty of implementing the active-standby hot standby, etc.