“Voice data” generally refers to data that has been created by sensing and digitizing human-audible sounds, and is a primary source of data for telephone networks. Traditionally, voice data has been transmitted over the Public Switched Telephone Network (PSTN) using 8,000 digital audio samples per second, eight bits/sample. This means that one-way voice communication on the PSTN requires 64 kbits/second of network bandwidth. In the traditional scheme, TDM (Time Division Multiplexing) has been used to carry more than one voice channel on a network link, e.g., by dividing a higher bit-rate channel into frames comprising multiple “time slots”, and assigning each voice channel to a certain time slot in each frame.
Circuit-switched TDM as described above uses more network bandwidth than absolutely necessary to transmit voice signals. First, because on a typical TDM link a circuit is dedicated to each call, each voice channel uses all of its allotted time slots, even if the voice data is blank because the speaker is silent. Second, many voice compression algorithms exist that can significantly reduce the number of bits required to transmit a voice signal with acceptable quality, but circuit-switched TDM is not designed to use these algorithms.
Packet voice technology is considered to be the successor to today's TDM voice networks. A packet generally consists of a header, which contains information identifying the packet, and a payload, which in this case carries voice data. An advantage of packet voice is that no dedicated end-to-end path through the carrier network is required. This is because each packet is a self-contained datagram, with the header containing all the information necessary to route the packet through the network to its destination.
With packet voice, the payload of each packet represents a number of consecutive voice samples from the same call (e.g., a 10 msec voice packet is equivalent to 80 TDM samples). Using a codec (compressor/decompressor), the voice data samples can be compressed on a per-packet basis. For instance, using a G.729 codec, 10 msec of voice can be reduced from 640 bits to 80 bits of information prior to transmission of the packet. And during silent periods, packets need not be sent at all. Accordingly, voice data payloads can require much less bandwidth/call in a packet-switched system than in a circuit-switched system.
The sampling interval used in packet voice to assemble a complete packet also comes with a disadvantage. Because no samples from a given sampling interval can be dispatched until the end of the sampling interval, a packet formation delay is introduced in the voice data. Thus the sampling interval cannot be too long (generally, anything more than a few tens of msec is considered too long), or call quality suffers.
To avoid call quality degradation, packets are generally sent at small intervals, e.g., every 10 msec. For a G.729 codec and a 10 msec sampling interval, this translates into a 10-byte packet payload. But as the datagram's header(s) may require 40 bytes or more, a great deal of the bandwidth efficiency gained over TDM is lost because of the relatively large header required per packet and the number of small-payload packets that must be sent. Although methods have been proposed to compress higher-layer headers across a single network link, these methods can place a heavy computational tax on each network element that must deal with the compressed headers, and the efficacy of such methods is highly dependent on the link layer technology employed. Particularly for core network switches, the sheer number of packets handled on each link make such approaches undesirable.
As a second alternative to TDM, PSTN providers have begun to deploy a networking technology known as Asynchronous Transfer Mode, or ATM. ATM promises an ability to mix voice, video, and other data on a connection-oriented network, and an ability to take advantage of voice activity detection (VAD) and compression algorithms to decrease bandwidth requirements for circuit-switched calls.
ATM mixes some concepts borrowed from circuit switching with some concepts borrowed from packet switching. An ATM network is tag-switched, and supports only one fixed, relatively short packet size—a 53-byte packet called a cell. Without regard to the type of information contained in a cell, each ATM cell must have a five-byte cell header and a 48-byte payload. The cell header is not a true datagram header, however, as it does not contain enough information to allow end-to-end delivery of a cell.
The cell header contains two identifiers—a Virtual Path Identifier (VPI) and a Virtual Channel Identifer (VCI). VPIs and VCIs are explicitly assigned at each segment of a connection and, as such, have only local significance across a particular link. They are remapped, as appropriate, at each switching point. When used as intended, groups of VCIs following a common route use the same VPI, so that intermediate ATM switches need only switch the group based on VPI and not on VCI.
Although ATM cells have packet characteristics, ATM is connection oriented. That is, two systems must set up an end-to-end “connection” over the network before they can communicate packets. This connection set-up uses a switch control protocol to establish a Virtual Circuit (VC) on each link in the end-to-end path. Although similar in some ways to a TDM-switched circuit, the VC does not require a dedicated circuit like a TDM connection; instead, the connection is merely a grant of permission to transmit cells at a negotiated data rate, with some guarantees as to quality-of-service (QoS) parameters such as minimum cell rate, average cell rate, and network delay.
When used with a voice codec such as G.729, ATM, like packet-based voice, may also waste bandwidth—not because ATM cell headers are excessively long, but because the cell payload is fixed at 48 bytes. For instance, G.729 requires only 8 kbits/sec of payload (only 10 bytes are required for each G.729 10 msec payload). But when each G.729 10 msec payload is placed into its own ATM cell, the resulting bandwidth is 42 kbits/sec.
To circumvent this problem, an AAL2 trunking service has been devised to pack multiple compressed voice payloads into one ATM cell. In the ATM protocol scheme, an ATM Adaptation Layer (AAL) interfaces higher-level protocols to the ATM layer (which buffers and forwards cells and provides connection and traffic management). AAL2 provides a service that can be used to multiplex compressed voice data from up to 248 separate calls onto one bearer VC. Data from multiple calls is packed into the payload of each cell, by attaching one of 248 available session Context Identifiers (CID) to each sub-payload. At the receiving end of an AAL2 trunk, the CIDs are used to de-multiplex the sub-payloads and associate them with an outgoing connection. When operating at full efficiency, AAL2 can provide a G.729 effective bandwidth of about 12 kbits/sec per call.
FIG. 1 shows an exemplary ATM/AAL2 network (the underlying ATM network 20 will typically handle other types of ATM traffic as well as AAL2). At the edge of network 20, AAL2 voice gateways 22, 24, 26, 28, and 30 route calls from PSTN TDM networks, Private Branch Exchanges (PBXs), and packet data networks (not shown) through the ATM network. For a particular call traversing ATM network 20, call signaling information is interpreted by switch processors 36 and 38, and an AAL2 bearer path through ATM network 20 is devised for the bearer channel data (call signaling for the call is also transported in some manner across the network to the call's egress point). Switch processors 36 and 38 complete the AAL2 bearer path by deciding which AAL2 VC to use on each AAL2 hop in the call's path, and by assigning a CID to the call for each VC that transports it. Switch processors 36 and 38 supply the ingress and egress voice gateways with mapping from their front end ports handling the call to the appropriate AAL2 VC and CID. Switch processors 36 and 38 supply multiplexing switches 32 and 34 with an inbound VC/CID and an outbound VC/CID for the call.
Bearer data for each call is handled as follows. Each call's bearer channel data is translated by its ingress gateway to AAL2 format—the translation includes compression, addition of an AAL2 CID, and multiplexing onto an outbound AAL2 VC as instructed by the switch processor. If the call goes through an AAL2 multiplexing switch, that switch demultiplexes the inbound AAL2 trunk into sub-payloads, examines the inbound CIDs of the sub-payloads, maps the sub-payloads to outbound VCs and CIDs, and multiplexes the sub-payloads sharing a common outbound VC. At the egress gateway, that call's bearer channel data is demultiplexed off of an inbound AAL2 VC, decompressed, and forwarded out of ATM network 20 towards its final destination.
Despite the efficient transport capabilities of the ATM/AAL2 voice network, it does have its disadvantages. First, the switch processors 36 and 38 must provide routing and call control (e.g., CID mapping) to the core AAL2 multiplex switches 32 and 34; thus, as the number of AAL2 trunks grows, the demands on the switch processor network increases, since the switch processor must control each ATM hop. The switch processors must comprehend, control, and balance usage of the AAL2 trunks and AAL2 multiplex switches to ensure efficient utilization of network resources. And the switch processors must provide a failover capability that recognizes when an AAL2 trunk or endpoint goes down and re-routes calls around the failure. As the network scales, these tasks become more and more difficult for the switch processors to perform well.