This invention relates to error monitoring of links in digital transmission systems and more particularly, to error monitoring of signaling links in high speed ATM networks.
In telecommunication networks, two types of information must be transmitted between the nodes: (a) user payload (e.g., voice, video, or data); and (b) signaling information to control (e.g., set-up and tear-down) the logical paths carrying the user payload. In the current telephone network, the signaling information is carried by a separate network known as the common channel signaling (CCS) network. In high speed ATM (asynchronous transfer mode) networks, the signaling information is carried on separate virtual circuits in the same physical network. Thus, while a CCS link is a physical link, an ATM signaling link is only a "virtual circuit". In either case, assuring integrity of signaling links is essential for meeting the stringent performance/reliability constraints of the signaling network. This is accomplished by deploying links in pairs, where each member of the pair is on a separate physical path and carries only one-half of the engineered traffic. The two links are constantly monitored for errors, and if either of them experiences a high error rate, its traffic is switched over to its mate.
Error monitoring algorithms are used in CCS networks. Error monitoring in ATM networks, however, is not currently being performed. This is because ATM networks until now have only sought to provide permanent virtual circuits (PVCs), i.e., virtual circuits that are provisioned and then left in place until the subscriber wants them to be removed. No special signaling protocol is necessary to handle PVCs. The next evolution in ATM networks is the provision of switched virtual circuits (SVCs), where the virtual circuits are created and destroyed dynamically as needed. This requires a protocol for exchanging messages necessary to set up and tear down SVCs. Such a protocol, known as SSCOP (service specific connection oriented protocol) has been specified in the ATM Adaptation Layer (AAL) in the control plane (also known as signaling AAL or SAAL). Its standardization is currently underway in the study group COM-XI of ITU-T. The issue of error monitoring for the virtual circuit (a PVC or SVC) running the SSCOP protocol must therefore be addressed.
Since error monitoring algorithms already exist for CCS, it is natural to investigate their use in the ATM context as well. Unfortunately, these prior art algorithms have several weaknesses that make them unsuitable in emerging telecommunication networks. Furthermore, the SSCOP protocol is also significantly different from the basic (i.e., level-2) communication protocol used in CCS so as to make a direct adoption of CCS error monitoring algorithms unsuitable. The CCS protocol and associated error monitoring algorithm is described hereinbelow to allow comparison with the SSCOP protocol and its error monitoring requirement.
The level-2 CCS protocol is the well-known "go back N" protocol (see e.g., A. Tanenbaum, Computer Networks, 2nd Ed., Prentice Hall, 1988, section 4.4, pp. 228-239). An arriving message goes into a FIFO (first-in, first-out) transmit buffer and waits its turn for transmission. After transmission, the message is saved in a retransmit buffer. The receiver acknowledges each message either with an ack (positive acknowledgement indicating the that the message was received correctly), or a nack (negative acknowledgement, which indicates that the message was corrupted). On receiving a nack, the corresponding message (and all messages following it in the retransmit buffer) are retransmitted. This ensures that the messages always arrive in proper order on the receive side. Another important characteristic of this protocol is that it transmits filler messages called FISUs (fill-in signal units) when it has no data to transmit. FISUs facilitate error monitoring by ensuring that the link always carries some traffic that can be monitored.
The error monitoring algorithm for 56 Kb/sec links is called SUERM (signal unit error rate monitor). SUERM is a "leaky bucket" algorithm and involves two parameters, denoted D and T. Each time SUERM receives an erroneous message, it increments an error counter C.sub.s. If C.sub.s crosses the threshold T, the link is taken out of service and its traffic is diverted to an alternate link. The algorithm is tolerant of occasional errors, however. For this, it decrements C.sub.s after receiving a block of D messages (correct and erroneous ones). It should be noted that SUERM counts FISUs as well and thus is not significantly affected by the traffic level on the link. The ITU standards provide for one set of fixed values of D and T parameters for all links.
It is clear from this description that the SUERM algorithm will tolerate an error rate of up to approximately I/D (i.e., when less than one out of every D messages is in error), but not significantly higher. A mathematical analysis of SUERM by V. Ramaswami and J. L Wang in "Analysis of the Link Error Monitoring Protocols in the Common Channel Signalling Network," IEEE Transactions on Networking, Vol. 1, Nov. 1, 1993, pp. 31-47, shows this behavior more clearly. If X denotes the time to take the link out of service, a plot of E(X) (i.e., average value of X) as a function of bit-error ratio (BER) is a curve having a "knee" when the message error ratio (MER) q.sub.m is 1/D. That is, for q.sub.m &lt;1/D, E(X) increases drastically, and for q.sub.m &gt;1/D, E(X) decreases slowly. This is a very desirable behavior, since it means that the link is taken out of service primarily when the error rate exceeds a threshold. The D parameter determines this threshold. The T parameter determines how sharp the knee is. Ideally, a "square knee" is desired so that the link will never be taken out of service if the error rate stays below the threshold.
In summary, although SUERM is a good algorithm for its application, its D parameter (or the threshold 1/D for the message error ratio) must be chosen properly. Given the message delay requirements, the maximum error rate that one can tolerate can be determined. This is called the sustainable error rate and is denoted as q.sub.b.sup.* (for BER) or q.sub.m.sup.* (for MER). Then D=1/q.sub.m.sup.*. It can be shown that the sustainable error rate depends on a number of parameters such as link speed, link length, message size, etc. Therefore, a single value of D will not work well for all links. This is the root cause of the problem with SUERM, as has been demonstrated by both laboratory tests and analysis.
Recently, there has been considerable interest in using 1.5 Mb/sec CCS links. The error monitoring algorithm for such links is known as EIM (errored interval monitor) (see e.g., D. C. Schmidt, "Safe and Effective Error Rate Monitors for SS7 Signaling Links", IEEE Journal of Selected Areas in Communications, Vol. 12, No. 3, April 1994, pp. 446-455). EIM is also a leaky-bucket algorithm that operates on time intervals (or slots) rather than individual messages. That is, a slot acts like a message for the purposes of error monitoring, which means that if any real message within a slot is errored, the entire slot is considered to be errored. EIM can be regarded as a slotted variant of SUERM. Slotted operation is attractive for high-speed links since it makes the sustainable error rate, and hence optimal D, independent of the message size. As with SUERM, optimal parameter selection for ElM still depends upon other network parameters.
SSCOP was designed specifically for modem high-speed networks which can be characterized by ample bandwidth and very low error rates and is thus quite different from the CCS protocol. Basically, SSCOP uses selective retransmission of errored messages along with periodic polling of the receiver by the transmitter. Messages are normally referred to as protocol data units or PDUs in SSCOP terminology. SSCOP is described in detail in the ITU document TD PL/11-20C Rev 1, "BISDN--ATM Adaptation Layer--Service Specific Connection Oriented Protocol", S. Quinn (ed.), 1993, which is incorporated herein by reference.
All user PDUs in SSCOP carry a sequence number (seqno) for detecting missing PDUs and for delivering them in proper order. The transmitter maintains a counter to keep track of the next seqno to send, and another one for the next seqno to acknowledge. The receiver also maintains two counters: one for the next sequence number expected, and the other for the highest sequence number expected. The latter counter will have a higher value than the former only when some PDUs get lost thereby causing a higher numbered PDU to arrive ahead of a lower number one. In such cases, the receiver alerts the transmitter by sending an unsolicited status message (ustat). The ustat identifies only the latest gap in sequence numbers (not the preexisting ones) and is intended to evoke the retransmission of PDUs in this gap.
The transmitter periodically sends a poll message to the receiver to enquire its status. In reply, the receiver sends a solicited status (stat) message, which contains a list of all currently existing gaps. The transmitter, in turn, retransmits all missing PDUs. Three buffers are needed on the transmit side to maintain all PDUs. These are a transmit buffer, a retransmit buffer, and a "bag" buffer. The first two are FIFO queues and are used for first-time transmission (user, poll, star, and ustat PDUs) and user PDU retransmission, respectively. The retransmit queue has a nonpreemptive priority over the transmit queue. The bag contains all unacknowledged PDUs. The purpose of the bag is to retain PDUs so that they will be available for retransmission.
Unlike CCS, there are no FISUs (fill-in signal units) in SSCOP; therefore, no transmission will occur when there is no user traffic. SSCOP is designed to ride on the ATM layer; however, as far as the error monitoring is concerned, this fact is irrelevant.
In a simple 2-node ATM network running SSCOP, two nodes are connected via two unidirectional links (actually, ATM virtual circuits) for forward and backward direction transmission. FIG. 1 shows a pictorial representation of the functional and hardware related activities connected with user PDU transmission in the forward direction. In particular, user PDUs and polls go in the forward direction whereas the corresponding stats and ustats for those PDUs go in the reverse direction. Similar transmissions occur for the other direction as well, but are not shown in order to simplify the figure.
The arriving user PDUs at 101 are placed in the transmit (xmit) buffer 102 and a copy of each (1(13) is saved in the bag 104 for possible retransmissions. Poll generation is controlled by a programmable poll timer 105. At the end of every polling interval, the poll timer 105 generates a poll and the timer is restarted. Polls (along with stats/ustats for the reverse direction) are also input into the transmit buffer 102. Copies of polls, stats and ustats are not saved since these PDUs are never retransmitted. When a PDU gets to the head of the queue, it is picked up for service by server 106. The PDU is segmented into ATM cells and then transmitted. The transmitted cells suffer propagation delay (108) over the forward ATM link 107. The ATM cells are then received by a receiver 109 on the receiving end of the forward link and assembled into a PDU, which is then checked for errors. If the PDU is uncorrupted, receiver 109 checks its type, which could be user, poll, stat or ustat (the latter two for reverse direction transmissions). An uncorrupted user PDU is placed into the receive buffer 110 for delivery to output 111. Delivery to output 111 may occur immediately if the PDU has the next expected sequence number; otherwise, the PDU is held in the receive buffer 110 until all PDUs with lower sequence number have been received correctly and delivered.
A received uncorrupted poll results in the generation of a stat message by stat generator 112, which lists all the existing sequence number gaps in the receive buffer 110. Finally, uncorrupted stats/ustats (for reverse direction transmissions) result in the retransmission on the reverse link I 15 through server 116 of missing PDUs placed in retransmit buffer 114 from a bag (not shown).
All corrupted messages (user, poll, stat, or ustat) are simply discarded at the receiving end of either the forward or reverse links. In the case of a corrupted user PDU, an uncorrupted user PDU will eventually arrive. If this uncorrupted PDU has a sequence number higher than the highest expected sequence number, ustat generator 118 generates a ustat and enters it into the transmit buffer 113 for transmission to the transmitting end of the forward link. The PDUs in the reverse direction also go through the usual process of segmentation (if necessary), transmission on link 115 (having the usual propagation delay 119), and reception, assembly, and error checking by receiver 121. If a stat or ustat PDU is corrupted, it is simply discarded, otherwise, it results in retransmission of missing user PDUs. For this, receiver 121 first locates all desired PDUs in the bag 104, makes a copy (122) of them, and places the copies in the retransmit buffer 123. As stated before, retransmissions get higher priority over transmissions. Thus, server 106 will not serve transmit buffer 102 while there are any PDUs in the retransmit buffer 123 awaiting to be transmitted.
The prior art error monitoring algorithms for CCS cannot be readily adapted for SSCOP. Specifically, the optimal choice of parameters for leaky bucket algorithms such as the previously described SUERM and ElM depend on the sustainable error rate, which is dependent upon the network parameters such as link length, link speed, offered load, error characteristics and message characteristics. In broadband applications these network parameters can vary significantly. Thus link length for terrestrial links may span from zero to about 5000 miles and satellite links may be up to 15,000 miles long. Currently, the link speeds for SSCOP are envisaged to range from 64 Kb/s to 4 Mb/s. In the future, even higher speed links are possible. Errors may come either singly or in bursts of varying severity and duration and the message size distribution may vary widely depending on the application. Because of all of these actors, it is difficult to use one or even a few error monitor parameter sets to cover the entire range of network parameters. Furthermore, these prior art algorithms were designed for situations where FISUs are transmitted when there are no regular user messages to transmit. In the absence of FISUs in SSCOP, the message error probability decreases directly as the traffic decreases. Thus, the prior art algorithm may fail to remove a link from service at low load levels.