1. Field of the Invention
The present invention relates to telecommunications system architectures, and more specifically to Voice over Internet Protocol (VoIP) architectures. Improved access methodologies and system components, including an access switch and signaling gateway, are described within the context of a novel VoIP architecture. These improved system components and methodologies manage session state more efficiently and provide a clear migration path from conventional system architectures.
2. Description of the Related Art
Although VoIP has been in existence for many years, evolving service demands are forcing a rapid evolution of VoIP technology. The pace at which new services are being integrated into existing networks continues to increase as VoIP products and services develop.
Though still evolving, packet-based telephony is becoming increasingly sophisticated. Indeed, voice protocols have developed to offer a rich set of features, scalability, and standardization than what was unavailable only a few years ago.
Critical to success of any telecommunications system is the ability to deploy value-added and high margin services. For many reason, a few of which are discussed below, VoIP and other IP-based technologies are best positioned to realize these profitable services.
Economics is perhaps the single greatest motivation for the development and deployment of VoIP. Increased competition among existing and emerging voice service vendors has brought tremendous downward pressure on the cost of voice services in the telecommunications market. This trend is likely to continue with further declines in the price of voice services.
Additionally, many existing networks support mostly data (Internet) services that are based on IP. Many services provider already own and will continue to build-out IP infrastructures. New service providers wishing to enter the voice services market will almost certainly transport voice traffic across the existing IP backbone. Building a parallel voice services network based on legacy circuit-switching equipment is simply not a cost-effective option.
Many emerging competitive local exchange carriers (CLECs) are sensitive to the cost of developing voice service networks. For many, the cost of legacy voice circuit-switching equipment is prohibitively high. Also the cost of the space, personnel, and operations capabilities required to maintain such networks is prohibitive. These carriers need a network that can be leveraged to realize other data services, such as Internet, virtual private networks, and managed network offerings. In order to provision such services and provide voice services, VoIP is an ideal solution.
Major telecommunications carriers are looking for ways to cut the cost of running and upgrading existing voice networks. These carriers want to replace and augment their existing networks with VoIP solutions for many of the reasons described above. Additionally, VoIP offers the carriers a path to circumvent existing tariff regulations. That is, carriers may use data services to transport voice calls to get around traditional (regulated) pricing structures and reduce the total cost of voice services.
In addition to cost advantages, VoIP networks offer compelling technical advantages over circuit switching networks. VoIP networks are more open to technical improvement and competition-drive improvement than circuit-switching networks which are dominated by entrenched equipment vendors. The open, standards-based architecture provided by VoIP networks allows greater interchangeability and more modularity than proprietary, monolithic, circuit-switching networks. Open standards also translate into the realization of new services that are rapidly developed and deployed, rather than waiting for a particular vendor to develop a proprietary solution.
In sum, the promise offered by VoIP technology is just beginning to be realized. As will be discussed hereafter, the contemporary state of VoIP technology and architecture is not without limitations and disadvantages. Yet, the utility offered by VoIP networks, as compared to traditional circuit-switching networks, is immense and beyond question. The issue is really one of optimally defining a VoIP architecture and providing enabling system components and access methodologies.
The advantages and preferred implementation of the present invention are best viewed and understood against the backdrop of the contemporary VoIP architectures. However, an understanding of contemporary VoIP technology requires at least some understanding of the public switched telephone network (PSTN).
There are four major tasks performed by the PSTN to connect a call. While the PSTN is capable of providing other services beyond point-to-point voice calls such services are predicated upon the following basic tasks: (1) signaling; (2) database services; (3) call set-up and tear-down; and (4) analog voice to digital data conversion.
Phones calls are inherently connection oriented. That is, a connection to the called entity must be established ahead of time before a conversation can occur. Switches, the central components in the PSTN, are responsible for creating this connection. Between the circuit switches are connections (trunk lines) that carry the voice traffic. These links vary in data communication speed from T-1 and E-1 to OC-192/STM-64, with individual channels (DS-0s) in each link type representing one voice channel. Switches are also responsible for converting the analog voice signal into a digital data format that may be transported across the network.
Signaling notifies both the network and its users of important events. Examples of signaling range from telephone ringer activation to the dialing of digits used to identify a called entity. Network elements also use signaling to create connections through the network.
The Signaling System Seven (SS7) is a standard, packet-based network that transports signaling traffic between the switches involved in the call. FIG. 1 illustrates the basic flow of a telephone call through the PSTN and SS7 network. A call begins when a user dials a destination phone number using call origination equipment 1. A local switch (not shown) typically analyses the destination number to determine if the call is a local one. If the call is local, the local switch may directly connects the call. More typically, the dialed telephone number results in a query to one or more service control points (SCP) 3. SCPs are databases that execute queries and translate the dialed telephone phone numbers into a set of circuit switching commands. SCPs also allow such common telephone features as 800 number support, 911 service, and caller ID. Signaling switch points (SSPs, not shown) are the interface between switches 4 and the SS7 network. It is here that SS7 messages are translated into the connection details required by the switches to physically connect the call origination point to the call destination point.
The SS7 control network is “out of band,” that is, it is not transmitted within the same links used to carry the actual voice channels. Specialized equipment called Signal Transfer Points (STPs) 6 transport the SS7 signaling messages. As will be seen hereafter, these STPs are analogous to IP routers in that the messages are carried in data packets called message transfer parts.
The SS7 network is very expansive and has been built up over many decades of effort and expense. Indeed, the SS7 network is actually a collection of signaling networks deployed throughout the developed world. There are many technical and historical reasons why the signaling portion of the network is broken out from the rest of the system. However, the greatest value in such a design is the ability to add network intelligence and features without a dependency on the underlying circuit-switching infrastructure.
Ultimately, the destination switch signals the destination equipment 5 that a call has arrived, typically by activating a ringer. When the called party picks up the receiver and completes the circuit, a conversation may take place. Throughout the conversation, switches convert analog voice signals into digital data capable of being transported over the network.
Once the call is complete, the switches notify the SS7 network which “tears-down” the circuit connections. In contrast to the “setup” function that identifies and enables the collection of circuit connections that facilitate the call, the tear-down function disables the connections, thus freeing up the switching resources for subsequent use by the network. As one would suspect there are many more details involved in making a telephone call, but these steps describe the basic flow of events. For example, a great many supervisory messages are communicated via the SS7 network during any one completed call.
Of particular note are the concepts of “call state” and “call control.” Call state is a general term referring to a great quantity of data that is maintained within the network. Call state may be stored at numerous points within the network. It includes at least data identifying the origination point of a call, the destination point for the call, and the switching control data required to connect the two points. Call state may also include data identifying particular mechanisms being used to implement and transport the call, such as analog-to-digital conversion type, echo cancellation parameters, noise elimination techniques, and silence suppression conditions, etc. Call state is maintained (stored) within the network throughout the entire pendency of the call, i.e., between call set-up and call tear-down.
Whereas call state typically defines the nature and quality of a call, call control defines “actions” associated with the call. Call control is often software code communicated in relation to a call. Execution of the code accomplishes the desired action. Actions include, as examples, commands to activate a ringer, send a dial tone, detect dialed digits, and send signaling data.
Like the PSTN, components forming a VoIP network must perform four basic functions: (1) signaling; (2) database services; (3) call connection and disconnection (bearer control); and (4) CODEC operations. FIG. 2 conceptually illustrates the VoIP network.
For purposes of this description, the IP Backbone can be viewed as one logical switch. The logical switch is, however, a distributed system. The IP backbone provides connectivity amongst a great number of distributed elements. Depending on the VoIP protocols used, this system as a whole is sometimes referred to as a softswitch architecture.
Signaling in a VoIP network is just as critical as it is in the legacy phone system. The signaling in a VoIP network activates and coordinates the various components to complete a call. Although the underlying nature of the signaling is the same, there are some technical and architectural differences.
Signaling in a VoIP network is accomplished by the exchange of IP datagram messages between network components. The format for these messages is covered by a number of standards, or protocols. Regardless of the protocol or hardware components used, the messages are critical to the function of a voice-enabled IP network and need special treatment to guarantee their delivery across the IP backbone.
Among other functions, database services are a way of locating an endpoint and translating an address communicated between two (usually heterogenous) networks linked by the VoIP backbone. For example, the PSTN uses phone numbers to identify endpoints, while a VoIP network might use a Universal Resource Locator (URL) or an IP address with port numbers to identify an endpoint. A call control database contains the necessary mappings and translations to identify endpoints. Functionally similar to call state and call control in the PSTN, “session state” in a VoIP network defines the nature of the call and controls the activities of components in the network.
The connection of a call in a VoIP network is made by two endpoints opening a communication session between each other. In the PSTN, the public (or private) switch connects logical DS-0 channels through the network to complete a call. In a VoIP implementation, this connection is a multimedia stream (audio, video, or both) transported in real time between endpoints. This connection is referred to as the bearer (or payload) channel and represents the voice or video content being delivered. When communication is complete, the IP session is ended and network resources are released.
As already noted, voice communication is analog, while data networking requires digital data. The process of converting analog voice into digital data is done by a coder-decoder (CODEC). There are many well known ways to convert analog voice into digital data. The processes used to convert, compress, and otherwise manipulate voice data in digital form are numerous and complex. Most are governed by publicly available standards.
The major components of a VoIP network are similar in functionality to the components forming a circuit-switched network. VoIP networks must perform all of the same tasks performed by the PSTN, in addition to providing a gateway between the VoIP network and the existing PSTN. Although using different technologies and approaches, some of the same component concepts that make up the PSTN are found in VoIP networks. As illustrated in FIG. 2, there are three general components forming a VoIP network: (1) media gateways 8; (2) media gateway controllers 9; and (3) the IP backbone 7.
Media gateways are responsible for call origination, call detection, and CODEC functions, including at least analog-to-digital conversion and voice packet creation. In addition, media gateways have optional features, such as data compression, echo cancellation, silence suppression, and statistics gathering. Media gateways form the interface allowing voice data to be transported across the IP network. In other words, media gateways are the source of bearer traffic. Typically, each call is a single IP session transported by a Real Time Transport Protocol (RTP) that runs over a User Datagram Protocol (UDP). Media gateways exist in several forms, ranging from a dedicated telecommunication equipment chassis to a generic PC running VoIP software.
Media gateway controllers house the signaling and control services that coordinate the media gateway functions. Media gateway controllers are responsible for call signaling coordination, phone number translations, host lookup, resources management, and signaling gateway services to the PSTN.
The IP infrastructure must ensure smooth delivery of the voice and signaling packets to the VoIP components. Due to their dissimilarities, the IP network must treat voice and data traffic differently. That is, voice and data traffic require different transport handling consideration and prioritization.
While there is correlation between VoIP and circuit-switching components, there are also many significant differences. One difference is found in the transport of voice traffic. Circuit-switching telecommunications can best be described as a Time-Division-Multiplexing (TDM) network that dedicates channels or reserves bandwidth as it is needed out of the trunk lines interconnecting an array of switches. For example, each phone call reserves a single DS-0 channel, and an associated end-to-end connection is formed to enable the call.
In contrast to the circuit-switching network, IP networks are packet-based networks based on the idea of statistical availability. Individual packets may be transported via a multiplicity of different paths through the IP network. This phenomenon often requires data packets to stop and be stored during their journey across the VoIP network. When this happens some form of queue management is required assess and handle the packets according to their priority indication. Accordingly, a packet prioritization schedule or Class of Service (CoS) definition is used to direct packets from specific VoIP applications in an appropriate manner. Such prioritization of IP network resources allows voice applications to coherently function across the IP network without being adversely affected by other data traffic on the IP backbone.
As noted above, the VoIP environment is one defined by standards. Two major standards bodies govern multimedia delivery over packet-based networks: the International Telecommunications Union (ITU) and the Internet Engineering Task Force (IETF). These two organizations are responsible for multiple protocols and equipment standards commonly used by participants in the VoIP market. A basic understanding of existing protocols is necessary to fully appreciate the present invention. These standards will be briefly described below, but a compete description may be readily obtained from numerous public sources including the organizations noted above.
Signaling System Seven SS7)
SS7 has been previously referenced in the discussion of the PSTN. FIG. 3 conceptually illustrates SS7 which is a widely used suite of telephony protocols expressly designed to establish and terminate phone calls. The SS7 signaling protocol is implemented as a packet-switched network. SS7 is both a protocol and a network designed to signal voice services. SS7 is a unified interface for the establishment of circuit-switching, translation, and billing services.
SS7 is not built on top of other protocols. Rather, as shown in FIG. 3, it is completely it own protocol suite from physical layer to application layer. For networks transporting SS7, it is important that these services be either translated or tunneled through the IP network reliably. Given the importance of SS7 signaling, it is necessary to ensure that these messages are given priority in the network. VoIP networks require access to the SS7 facilities in order to bridge calls onto the PSTN.
H.323
ITU recommendation H.323 specifies a packet-based multimedia communication system. The specification defines various signaling functions, as well as media formats related to packetized audio and video services.
H.323 standards were generally the first to classify and solve multimedia delivery issues over Local Area Networks (LAN) technologies. However, as IP networking and the Internet became prevalent, many Internet RFC standard protocols and technologies were developed and sometimes based on H.323 ideas. H.323 networks consist of media gateways and gatekeepers. Gateways serve as both H.323 termination endpoints and interfaces with non-H.323 networks, such as the PSTN. Gatekeepers function as a central unit for call admission control, bandwidth management, and call signaling. A gatekeeper and all its managed gateways form a H.323 “zone.” Although the gatekeeper is not a required element in H.323, it can help H.323 networks to scale to a larger size by separating call control and management functions from the gateways.
H.323 specifications tend to be heavy handed and initially focused on LAN networking. Not surprisingly, the standard has some scalability shortcomings. One H.323 scalability issue is its dependency on TCP-based (connection-oriented) signaling. It is difficult to maintain large numbers of TCP sessions because of the greater overhead involved.
With each call initiated, a first TCP session is created using a first protocol defining a set of messages. A TCP connection is maintained throughout the duration of the call. A second session is established using another protocol. This TCP-based process allows an exchange of capabilities, master-slave determination, and the establishment and release of media streams.
The H.323 quality of service (QoS) delivery mechanism of choice is the Resources Reservation Protocol (RSVP). This protocol does not have good scaling properties due to its focus and management of individual application traffic flows. For these and other reasons, H.323 has historically been deemed ill-suited for service provider applications, and has been relegated to enterprise VoIP applications.
Real-Time Transport Protocol (RTP)
The RTP protocol provides end-to-end delivery services for data with real-time characteristics, such as interactive audio and video. Services include payload type identification, sequence numbering, time stamping, and delivery monitoring. Further, the RTP protocol provides features for real-time applications, including timing reconstruction, loss detection, content delivery, and identification of encoding schemes. Many media gateways that digitize voice applications use RTP to deliver the voice traffic. For each session participant, a particular pair of destination IP addresses define the session in order to facilitate a single RTP session for each call.
RTP is an application service built on UDP, so it is connectionless with best-effort delivery. Although RTP is connectionless, it does have a sequencing system that allows for the detection of missing packets. As part of its specification, the RTP identifies the encoding scheme used by the media gateway to digitize voice content. With different types of encoding schemes and packet creation rates, RTP packets can vary in size and interval. The combined parameters of a RTP session dictate how much bandwidth is consumed by the voice bearer traffic. RTP transporting voice traffic is the single biggest data contributor to conventional VoIP traffic.
Media Gateway Control Protocol (MGCP)
The MGCP (like its predecessor SGCP) largely defines the contemporary softswitch architecture. It is a master-slave control protocol the coordinates the action of media gateways. In effect, the MGCP divides the functional role of traditional voice switches between the media gateway and the media gateway controller.
In MGCP nomenclature, the media gateway controller is often referred to as a “call agent.” The call agent manages the call-related signaling control intelligence, while the media gateway informs the call agent of service events. The call agent instructs the media gateway to setup and tear-down connections when calls are generated. In most cases, the call agent informs the media gateway to start an RTP session between two endpoints.
As one might imagine, the control and response sequence orchestrated by the MGCP between the media gateway and the call agent requires a substantial quantity of call state data to be stored at both the media gateway and call agent. The signaling exchanges between call agent and media gateway are accomplished by structured messages inside UDP packets. The call agent and media gateways typically have retransmission facilities for these messages, but the MGCP itself is stateless. Hence, a lost message is timed out by the VoIP components, as compared with a TCP delivery mechanism where the protocol attempts to retransmit a lost packet. Accordingly, MGCP messages are usually given very high priority in the VoIP network over other non-real-time data packets.
A number of routine MGCP functions executed during a call will be described in relation to FIG. 4. A call begins when a user picks up the origination phone 10 and dials a destination number. The media gateway 11 then notifies the media gateway controller (call agent) 12 that a call is incoming. The media gateway controller 12 looks up the dialed phone number and directs the media gateways to create a RTP connection (i.e., it identifies an IP address and port number). Media gateway controller 12 also informs the destination media gateway 13 of the incoming call, and the destination phone 14 rings. Thereafter, media gateways 11 and 13 open an RTP session across the IP network 15 when destination phone 14 is answered.
Session Initiation Protocol (SIP)
SIP is part of the IETF's multimedia data and control protocol framework. SIP is a powerful client-server signaling protocol used in VoIP networks. SIP handles the setup and tear down of multimedia sessions. Such sessions may include multimedia conferences, telephone calls, and multimedia content deliveries.
SIP is a text-based signaling protocol transported using either TCP (transmission control protocol) or UDP (user datagram protocol). SIP is relatively simple, efficient, and extendable, owing much of its design philosophy and architecture to the Hypertext Transfer Protocol (HTTP) and Simple Mail Transfer Protocol (SMTP). Thus, SIP uses invitations to create Session Description Protocol (SDP) messages to carry out capability exchange and setup call control channel use. Such invitations allow “participants” to agree on a set of compatible media types.
SIP supports user mobility by proxying and redirecting requests to the user's current location. Users may inform the server of their current location (an IP address or URL by sending a registration message to a registrar. Such a capability is increasingly attractive given a large and growing base of mobile users. With its mobile features, SIP implementations tend to be more discrete and SIP clients tend to be larger in number and more geographically distributed. Hence, one of the great challenges to implementing SIP services is mapping CoS delivery for the signaling and bearer traffic.