In transport networks, reliability requirements for network equipment are very strict since a single failure could affect a large volume of network traffic and thus a large number of connected subscribers. Increased reliability is typically achieved through equipment protection, where critical components within network elements are protected by spare components, which can take over operation in case of a failure. A particularly critical component is the switch matrix of large switching nodes such as digital crossconnects.
One possibility of equipment protection for the switch matrix in a crossconnect system is 1+1 protection. The system is provided with two complete and independent switch matrices, one acting as working or “live” switch matrix and the second as standby switch matrix. This has the advantage, that both matrices can be configured the same and working in parallel, so that in case of failure, the standby matrix can simply be selected to take over operation without requiring prior time-consuming matrix configuration steps. This protection scheme is therefore called hot standby protection. However, it requires a 100% overhead of unused resources and is hence costly.
In large switching nodes, the matrix design is typically modular so that the switch matrix consists of a number of matrix boards. In such arrangement it is possible to implement a N+1 protection scheme, where one spare matrix board is provided to take over operation should one of the N working boards fail. However, in the cases of a failure, the switch matrix must be reconfigured to include the spare matrix board, which is time consuming. Such protection scheme is therefore called cold standby protection.
Even though the hot standby method can be considerably faster than the cold standby method, both are typically not hitless, meaning that a short traffic interruption of at least a few frames will occur. It is however important that equipment protection switching (EPS) is faster than network level protection schemes such as line protection or path protection. In case of a failure of the switching matrix, the equipment protection method should switch before armed line protections can react, i.e. in considerably less than 50 ms.
Moreover the correlation mechanisms that are usually used to determine a failure condition and initiate protection switching are rather slow and not very accurate. Random faults like “single event upsets” for example or errors in the matrix chips cannot be discovered and corrected by EPS mechanisms.