This invention relates generally to an apparatus for establishing communication paths over a circuit switched network, a connectionless packet switched network, and a connection-oriented packet switched network, and more particularly to an apparatus for establishing point-to-point or point-to-multipoint audio or video communication over a telephony network, the Internet, and an Asynchronous Transfer Mode (ATM) or a Frame Relay (FR) network.
Voice traffic transmitted between two or more users over a telephony network is carried over circuit-switched paths that are established between the users. Circuit-switched technology is well-suited for delay-sensitive, real-time applications such as voice transmission since a dedicated path is established. In a circuit-switched network, all of the bandwidth of the established path is allocated to the voice traffic for the duration of the call.
In contrast to the telephony network, the Internet is an example of a connectionless packet-switched network that is based on the Internet Protocol (IP). While the majority of the traffic carried over the telephony network is voice traffic, the Internet is more suitable to delay-insensitive applications such as the transmission of data. The Internet community has been exploring improvements in IP so that voice can be carried over IP packets without significant performance degradation. For example, the resource reservation protocol known as RSVP (see RSVP Version 2 Functional Specifications, R. Braden, L. Zhang, D. Estrin, Internet Draft 06, 1996) provides a technique for reserving resources (i.e. bandwidth) for the transmission of unicast and multicast data with good scaling and robustness properties. The reserved bandwidth is used to effectively simulate the dedicated bandwidth scheme of circuit-switched networks to transmit delay-sensitive traffic. If RSVP is implemented only for those communications having special Quality of Service (QoS) needs such as minimal delay, the transmission of other communications such as non-real time data packets may be provided to other users of the Internet in the usual best-effort, packet-switched manner.
The majority of Internet users currently access the Internet via slow-speed dial/modem lines using protocols such as SLIP (serial line IP) and PPP (Point to Point Protocol), which run over serial telephone lines (modem and N-ISDN) and carry IP packets. Voice signals are packetized by an audio codec on the user""s multimedia PC. The voice packets carry substantial packetization overhead including the headers of PPP, IP, UDP, and RTP, which can be as big as 40 octets. Transmitting voice packets over low speed access lines is almost impossible because of the size of the header relative to the size of a typical voice packet (20-160 octets, based on the average acceptable voice delay and amount of voice compression). However, several proposals have emerged to compress the voice packet headers so that greater transmission efficiency and latency can be achieved for voice-packets transmitted over low-speed, dial access lines.
A substantial number of users is expected to begin sending voice traffic over the Internet with acceptable voice quality and latency because of the availability of RSVP and packet-header compression technologies. The transmission terminals for sending packetized voice over the Internet are likely to be multimedia personal computers.
In addition to the telephony network and the Internet, other transmission standards such as Frame-Relay and ATM have been emerging as alternative transport technologies for integrated voice and data. ATM/FR networks are similar to the telephony network in that they both employ connection-oriented technology. However, unlike the telephony network, ATM/FR networks employ packet switching. In contrast to the Internet Protocol, which is a network layer protocol (layer three), FR and ATM pertain to the data link layer (layer two) of the seven-layer OSI model.
Frame Relay and ATM can transport voice in two different formats within the FR (or ATM) packets (cells). In the first format, the FR (ATM) packets (cells) carry an IP packet (or some other layer-3 packet), which in turn encapsulates the voice packets. Alternatively, the FR (ATM) packets (cells) directly encapsulate the voice packet, i.e., without using IP encapsulation. The first alternative employs protocols such as LAN Emulation (LANE), Classical IP Over ATM, and Multiprotocol Over ATM (MPOA), all of which are well known in the prior art. The second alternative is referred to as xe2x80x9cVoice over FRxe2x80x9d and xe2x80x9cVoice over ATMxe2x80x9d, respectively. Note that the first alternative, which includes IP encapsulation, allows voice packets to be routed between IP routers. That is, layer-3 processing is performed by the routers along the voice path to determine the next hop router. The second alternative is a purely FR/ATM switched solution. In other words, switching can be performed only at the data link layer. FIG. 1 depicts the protocol stacks for transport of voice over IP and the two alternatives for voice over FR/ATM.
The audio codec depicted in FIG. 1 enables voice encoding/decoding, including voice digitization, compression, silence elimination and formatting. The audio codec is defined by ITU-T standards such as G.711 (PCM of Voice Frequencies), G.722 (7 Khz Audio-Coding within 64 Kbps), G.723 (Dual Rate Speech Code for Multimedia Telecommunications Transmitting at 6.4 and 5.3 Kbps), and G.728 (Speech Encoding at 16 Kbps).
The xe2x80x9cVoice over ATM/FR layerxe2x80x9d depicted in FIG. 1 is referred to as the multimedia multiplex and synchronization layer, an example of which is defined in ITU-T standard H.222. ITU-T is currently defining the H.323 standard, which specifies point-to-point and multipoint audio-visual communications between terminals (such as PCs) attached to LANs. This standard defines the components of an H.323 system including H.323 terminals, gate-keepers, and multi-point control units (MCUs). PCs that communicate through the Internet can use the H.323 standards for communication with each other on the same LAN or across routed data networks. In addition to H.323, the ITU-T is in the process of defining similar audio-visual component standards for B-ISDN (ATM) in the H.310 standard, and for N-ISDN in the H.320 standard. The previously mentioned standards also define call signaling formats. For example, IP networks use Q.931 call controls over a new ITU-T standard known as H.225 (for H.323 terminals). Telephony networks use Q.931 signaling and ATM networks use Q.2931 signaling.
Many standards bodies are in the process of defining how voice (and video) can be transported within a given homogenous network such as the telephony, IP, FR and ATM networks. However, there is currently no arrangement for transmitting voice over a heterogeneous network that consists of two or more such networks employing different transmission standards.
In accordance with the principles of the invention, the foregoing problem is addressed by employing a gateway which connects to the telephony network, the Internet and the ATM/FR network. Such gating facilities are needed if communication getween users on different networks are to be allowed. The telephony network, Internet and FR/ATM Networks all use different schemes for establishing a voice session (i.e., call set-up protocols), and different formats for controlling a session and transporting voice. The gateway of the present invention provides conversion of the transmission format, control, call signaling and audio stream (and potentially video and data streams) between different transmission standards.
Embodiments of the disclosed gateway provide some or all of the following functions: call-signaling protocol conversion, audio mixing/bridging or generation of composite audio and switching, address registration, address translation, audio format conversion, audio coding translation, session management/control, address translation between different address types, interfacing with other gateways, interfacing with the SS7 signaling network, and interfacing with an Internet signaling network.
The apparatus establishes a communication session between first and second terminals that may be resident in networks that employ differing transmission standards. The different networks may, illustratively, be a circuit switched network (e.g., a telephony network), a connectionless packet switched network (e.g., the Internet) or a connection-oriented packet switched network (e.g., an ATM or frame relay network). The communication session may be an audio session, a video session or a multimedia session.
The apparatus includes a call set-up translator for translating among call set-up protocols associated with the circuit switched network, the connectionless packet switched network and the connection-oriented packet switched network. An encoding format translator is provided for translating among encoding protocols associated with the circuit switched network, the connectionless packet switched network and the connection-oriented packet switched network. Also provided is an address database for storing a plurality of addresses in different formats for each registered terminal, which includes the first and second terminals. The apparatus also includes a session manager for storing control information relating to the first and second terminals. The control information includes an identification of the first and second terminals that participate in the communication session. Participation in a conversation by more than a pair of terminals is easily accommodated by the disclosed gateway.