1. Field of the Invention
The present invention relates generally to data transmission in wide area networks, by way of example, the asynchronous transfer mode (ATM) networks. More specifically, the invention relates to error monitoring within the ATM network and to a decision process for switching to the redundant portions of the switching fabric and network.
2. Related Art
Developments with the telecommunication industry has significantly improved the ability for people to communicate, exchange data, perform research, and, more generally, the ability to access information resources that were unavailable even in recent history to the common person. The new communication networks are altering the business landscape and are altering the very way individuals work, shop, and keep in touch with each other. Not only, for example, can one use cellular phone service or e-mail to communicate with others, one can also now obtain large documents, graphic images, databases, and other types of information having significant memory footprints through wireless and wireline networks.
The manner in which the communication networks are evolving creates a need for more capable information access tools (computers and transceivers, for example). The new tools, in turn, create a need for new networks having increase data throughput capacity and reliability. New networks and information exchange capabilities that were unimaginable even in recent times are being developed and implemented in a way that impacts businesses and individuals in a significant way. For example, standalone computers may now be integrated with wireless radio telephones to allow the transmission of information from the computer to a destination by way of a wireless communication network and then by way of the Internet.
The recent explosion of the Internet is creating the capability and desire for networks of all types to be integrated and coupled to exchange data signals carrying the varying types of information. In many cases, the same data also will also be transported through a local area network (LAN) prior to being delivered to the Internet. Thus, by way of example, a digitized signal can be transported from a source through a LAN and through the Internet, to a final destination. Moreover, within the Internet portion itself, there may be a need to transport the user data through a backbone data transport infrastructure, by way of example, through an ATM network.
Generally speaking, the Internet is, in essence, a collection of many large and small computer networks that are coupled together over high speed backbone data links such as T-1, T-3, OC-1 and OC-3. Stated differently, the Internet is a network of networks. As a result of the creation of the Internet, worldwide access may be achieved. People and their equipment may now communicate from most any civilized point to another in a fast and relatively inexpensive medium.
While it is popular to think of the Internet as one network of networks, there are other such Internets that are in existence and that are under development. For example, the network now commonly known as the Internet was originally a network of institutional networks including university networks. As a result of the commercialization of the Internet and the resultant reduction in quality of service, new generation Internet type networks are under development to better achieve the purposes of the original xe2x80x9cInternetxe2x80x9d. Moreover, new international standards and protocols are being approved to create additional and enhanced Internets. For the sake of simplicity, however, each of the worldwide Internet networks will be referred to collectively as the Internet.
Regarding its physical aspects, the Internet is a packet switched network that is currently based upon a group of protocols known as transmission control protocol/Internet protocol (TCP/IP). TCP is a connection-oriented protocol that first establishes a connection between two computer systems that are to exchange data. TCP then breaks a given digital information signal into packets having a defined format. The packets are then attached to headers that are for containing control and address information.
For example, in addition to a destination address, a TCP packet typically contains a sequence number that is to be used by the destination in reconstructing a signal that is similar to the original digital information that was broken into packets at the originating end. TCP packets also typically include port IDs, checksum values and other types of control information as is known by those skilled in the art.
IP protocol is used for routing purposes. Thus, the IP protocol includes the destination and originating addresses and default gateway identifiers. IP routers, therefore, are operable to evaluate IP protocol information for routing an IP data packet and to evaluate TCP protocol information for error control and other similar purposes.
In order to make communication devices created by companies throughout the world compatible with each other to create local area networks and worldwide networks such as the Internet, protocols and standards are often defined. These protocols and standards are used to guide the design of the communication devices, and more specifically, to guide the design of the operating logic and software within the devices. While communication devices that are designed in view of these standards do not always follow the suggested models exactly, they are usually compatible with the protocol-defined interfaces (physical and logical). In order to appreciate the construction and operation of many devices, it is important to generally understand the concepts of some of the significant protocol standards and models.
One important model that currently guides development efforts is the International Standards Organization (ISO) Open Systems Interconnection (OSI) model. ISO/OSI provides a network framework or model that allows equipment from different vendors to communicate with each other. The OSI model organizes the communication process into seven different categories or layers and places these layers in a sequence based on their relation to the user. Layers 1 through 3 deal provide actual network access and control. Layers 4 through 7 relate to the point to point communications between the message source and destination.
More specifically, the seven layers in the OSI model work together to transfer communication signals through a network. Layer 1 includes the physical layer meaning the actual hardware that transmits currents having a voltage representing a bit of information. Layer 1 also provides for the functional and procedural characteristics of the hardware to activate, maintain, and deactivate physical data links that transparently pass the bit stream for communication between data link entities. Layer 2 is the data link layer or the technology specific transfer layer that effectuates and controls the actual transmissions between network entities. For example, layer 2 provides for activation, maintenance, and deactivation of data link connections, character and frame synchronization, grouping of bits into characters and frames, error control, media access control and flow control.
Layer 3 is the network layer at which routing, switching and delaying decisions are made to create a path through a network. Such decisions are made in view of the network as a whole and of the available communication paths through the network. For example, decisions as to which nodes should be used to create a signal path are decided at layer 3. As may be seen, layers 1, 2 and 3 control the physical aspects of data transmission.
While the first three layers control the physical aspects of data transmission, the remaining layers relate more to communication functionality. To illustrate, layer 4 is the transport layer that defines the rules for information exchange and manages the point to point delivery of information within and between networks including providing error recovery and flow control. Layer 5 is the session layer that controls the basic communications that occur at layer 4. Layer 6 is the presentation layer that serves as a gateway (a type of xe2x80x9csoftwarexe2x80x9d interface) between protocols and syntax of dissimilar systems. Layer 7 is the application layer that includes higher level functions for particular application services. Examples of layer 7 functions include file transfer, creation of virtual terminals, and remote file access.
Each of the above defined layers are as defined by the OSI model. While specific implementations often vary from what is defined above, the general principles are followed so that dissimilar devices may communicate with each other.
With respect to the forgoing discussion regarding the seven OSI layers, IP is a layer three protocol. In contrast, many of the backbone data transport infrastructures utilize a different layer protocol than an Internet router. Many of the common backbone data transport systems utilized include time division multiplexed (TDM) transmission systems. TDM systems are generally known. These TDM systems are usually implemented in a manner that provides full redundancy in order to maintain transmission in the event of a fault on one of the channels or communication links. A protection path is, traditionally, a redundant path for transmitting signals in a failure condition.
In ordinary conditions, either the user traffic (data) is not transmitted in the redundant protection path or, alternatively, it is routed but is not processed by a destination. Given the large amounts of data that are transmitted in a modern wide band network, it is important to monitor network conditions in the primary and the protection paths according to which path is being utilized for transporting the data.
Error conditions that prompt a node to switch to the protection path often are related to hardware (layer 1) problems in which communications are not being successfully transmitted in a communication link. As communication glitches are not uncommon, however, it is unacceptable design to have a system that switches the instant that a communication glitch occurs. A system must determine that the glitch results from an actual hardware or communication path failure. Usually, however, it is difficult to make such a determination from only one glitch.
Several challenges exist in implementing topologies having full redundancy. For example, it is necessary for the switching from the working path to the protection path to occur quickly in the event of a fault so that a minimal amount of information is lost. Typically, switching occurs at the layer 1 level to minimize the down time. As a result, however, little error protection is provided at the hardware level for failures. Additionally, layer 1 switching results in the switching of entire data transport pipelines. By way of example, a typical pipeline that is switched as a result of a layer 1 switching decision and event may have a throughput capacity in excess of 100 mega bits per second (Mbps).
A synchronous transfer mode networks, are advantageous in that they are very high-speed transmission broadband type networks that improve network efficiencies by transmitting data, including voice data in an asynchronous manner. Stated differently, conventional networks carry data in a synchronous manner which results in the transmission of empty data slots in a TDM network. Thus, network capacity is wasted.
ATM networks, however, only transmit fixed length data packets, in units called cells, as a need to transmit the data presents itself. Thus, ATM is a broadband, load delay, packet type of switching and multiplexing system that allows for flexible transmission band widths and is capable of transmitting data in excess of a 600 Mbps data transmission rate. Because ATM is operating at such high bit rates, the cell stream is often continuous and without gaps. Cells produced by differing streams to an ATM multiplexer are stored in queues awaiting cell assignment. The ATM system, by building a queue of cells, produces a continuous stream of data thereby maximizing network efficiencies.
Thus, because large amounts of data are likely to be transported during the time that a failure condition is occurring, there an increasing need for providing protection path switching in a manner that reduces unnecessary protection path switching. On the other hand, it also is important to provide switching in a manner that minimizes the amount of data that is lost due to the error condition before the switching occurs. Additionally, there is a need to implements systems that accomplish these goals economically in terms of system resources.
In order to achieve reliability and high bandwidth, dual switch fabrics and Tap multiplexers are utilized wherein there exists a Tap multiplexer (xe2x80x9cTap Muxxe2x80x9d) for every line card. Each Tap Mux interfaces with a primary and redundant (protection) path switch fabric access devices. The Tap Mux is connected to each fabric access device by way of four serial lines. Two of the serial lines are for the primary path and two are for the protection path. Each serial line typically carries a nibble (4-bits) of serial data that are eventually converted to a parallel format by a fabric access device. Within the disclosed ATM network, four fabric access devices are provided for converting the four bit nibbles of data into a parallel form. The invention improves network efficiencies by monitoring each of the many communication links within the switching fabric to determine when a switch should occur to the protection path.
In order to economically, in terms of system resources, determine when to provide protection path switching in a described embodiment of the invention, the inventive system monitors each input line for a plurality of fabric access devices in a manner that does not require detected errors to be time stamped. The fabric access devices effectively form an interface between a processing unit and a plurality of Tap Muxes. One function of the Fabric Access Devices (FAD) is to convert the 4-bit nibbles of data received from a plurality of serial buses into a parallel bit stream. Additionally, the FAD selectively switches a source of inputs carrying the nibbles of data to produce an output to the processor unit.
Because there are four FADs in the present system for the primary path and four FADs for the protection path, and because each FAD is connected by nine serial data line sources, namely, one each from the eight different Tap Muxes and one from the Tap Mux of the fabric controller. The present invention includes creating 72 state machines for monitoring each of the input data line sources to the FADs and for determining when switching should occur. Accordingly, the switching logic that is defined herein for the described embodiments is distributed across 72 state machines in one of the described embodiments.
The defined logic, in the described embodiment of the invention, includes monitoring the input line sources for specified errors and, upon the detection of the occurrence of an error, initiating a fixed length window of time during which the occurrence of the specified errors are counted. Once a defined number of errors on a given line is exceeded within the fixed length window of time, switching occurs from the primary path to the protection path. One advantage of utilizing a fixed length window in the described embodiment of the invention is that time stamping of errors is not required. Thus the error-checking algorithm is simplified. Additionally, by defining a number of errors in a communication link for a fixed time length period, a number can be utilized in which occasional glitches do not result in switching but wherein a true hardware type of communication link problem does lead to the fast switching of the network.
Each of the 72 state machines are executed by a health maintenance module formed within a fabric processor. The fabric processor includes an error checking module and a fabric control module. Accordingly, the error-checking module continuously checks each of the 72 input line sources to the four FADs for the detected errors. The health maintenance module communicates with the error-checking module to implement the error switching logic defined herein.
Once the health maintenance module determines that it is necessary to switch fabrics, it communicates with the fabric control module to prompt it to initiate and complete switching from the primary fabric to the protection fabric. Typically, switching is provided for the entire fabric even if the error is found to occur in only one communication line of the primary switching fabric.
Each of these modules are logically formed by computer instructions stored within a storage device of the Fabric controller and are executed by an internal processor in communication therewith by way of an internal bus. The processor executes the computer instructions stored within the storage device to perform the functionality represented by the fabric control module, the error-checking module, and the health maintenance module. The storage device includes additional computer instructions, that define the Fabric controller""s interaction and data processing capabilities in general. The processor of the Fabric controller generates control signals that are to be transmitted externally by way of a parallel bus that is controlled by an internal bus controller.
An inventive method of the described embodiment of the invention generally includes checking for the occurrence of errors of a defined set of errors, and when such an error is found, setting an error counter to one and then starting a fixed length timing window. Each time an error is received, the error-counter is incremented to monitor the total number of errors. If the total number of errors for a given communication link exceeds a specified number within fixed length timing window, i.e., since receipt of the first error, switching to the protection path is initiated.
If the specified period expires before the specified number of errors is reached, the system is reset and the error counter is reset to zero. This algorithm is performed for all 72 data line sources being received by the plurality of FADs.
In an alternate embodiment of the invention, a sliding window is used. More specifically, each error is recorded with a time stamp. If a specified number of errors are detected within a defined time period, then protection path switching is initiated. This alternate embodiment is not as desirable because it requires a more complicated algorithm that evaluates the time stamps of the detected errors. On the other hand, it is advantageous in that it will always detect the condition in which a specified number of errors are detected within a specified period of time. In the described embodiment, protection path switching only occurs if the specified number of errors are detected within the initiated fixed length timing window.
In one described embodiment of the invention, four types of error are monitored for the Tap Mux and for the FADs. Those errors are cell parity, idle pattern, clock recovery, and phase lock loop lock. For the switch fabric, however, two errors are monitored. They are the buffers in use error and the free que head pointer error.