The present invention relates generally to voice over IP networks and more particularly relates to a method of dynamically assigning priority to packets over an Internet Protocol (IP) based network.
Currently, there is a growing trend to converge voice and data networks so that both utilize the same network infrastructure. The objective is to create a single fabric that is designed to deliver both voice and data, thus enhancing the effectiveness and productivity of individuals in an enterprise. The currently available systems that combine voice and data have limited applications and scope. An example of such a system is the Automatic Call Distribution (ACD), which permits service agents in call centers to access customer files in conjunction with incoming telephone calls. ACD centers, however, remain costly and difficult to deploy, requiring custom systems integration in most cases. Another example system is the voice logging/auditing system used by emergency call centers (e.g., 911) and financial institutions. Deployment has been limited due to the limited scalability of the system since voice is on one network and data is on another, both tied together by awkward database linkages.
The aim of packet telephony is to provision voice over packet switched fabrics in both the local area network (LAN) and the wide area network (WAN). Currently, voice and data generally flow over separate networks, the goal being to transmit them both over a single medium.
A block diagram illustrating example separate prior art data and voice networks is shown in FIG. 1. The LAN portion, generally referenced 10, comprises the LAN cabling infrastructure 12 and one or more network devices connected to the LAN. Examples of typical network devices include servers 14 and workstations 16. The voice portion, generally referenced 20, has at its core a private branch exchange (PBX) 24 which comprises one or more trunk line interfaces and one or more telephone extension interfaces. The PBX is connected to the public switched telephone network (PSTN) 22 via one or more trunk lines 28, e.g., analog, T1, E1, T3, ISDN, etc. A plurality of user telephones 26 is also connected directly to the PBX via phone line extensions 29.
The paradigm currently in wide spread use consists of circuit switched fabric 20 for voice networks and a completely separate LAN infrastructure 10 for data. Most enterprises today use proprietary PBX equipment for voice traffic.
An increasingly common packet telephony paradigm consists of telephone and data tightly coupled on packet based, switched, multimedia networks where voice and data share a common transport mechanism. It is expected that this paradigm will spur the development of a wealth of new applications that take advantage of the simultaneous delivery of voice and data over a single unified fabric.
A block diagram illustrating a voice over packet network where voice and data share a common infrastructure is shown in FIG. 2. The IP telephony system, generally referenced 30, comprises, a LAN infrastructure represented by an Ethernet switch 32, one or more telephones 36, workstations 34, a gateway 42, a gatekeeper 46 and Layer 3 switch 38. The key components of an IP telephony system 30 are the modified desktop, gatekeeper and gateway entities. For the desktop, users may have an Ethernet phone 36 that plugs into an Ethernet RJ-45 jack or a handset or headset 35 that plugs into a PC 37.
Today, all LAN based telephony systems need to connect to the PSTN 44. The gateway is the entity that is specifically designed to convert voice from the packet domain to the circuit switched domain. The gatekeeper is primarily designed to control the creation of real time connections over the network.
The IP telephony traffic is supported by a packet infrastructure such as an Ethernet based network. Telephony calls traversing the intranet may pass through a Layer 3 switch 38 connecting a corporate intranet 40. The Layer 3 switch should support Quality of Service (QoS) features such as IEEE 802.1p and 802.1Q and Resource ReSerVation Protocol (RSVP) described in IETF RFC 2205 entitled xe2x80x9cResource ReSerVation Protocol (RSVP)xe2x80x94Version 1 Functional Specificationxe2x80x9d.
The International Telecommunications Union (ITU-T) Telecommunications Standardization Sector has issued a number of standards related to telecommunications. The Series H standards deals with audiovisual and multimedia systems and describes standards for systems and terminal equipment for audiovisual services. The H.323 standard is an umbrella standard that covers various audio and video encoding standards. Related standards include H.225.0 that covers media stream packetization and call signaling protocols and H.245 that covers audio and video capability exchange, management of logical channels and transport of control and indication signals. Details describing these standards can be found in ITU-T Recommendation H.323 (Draft 4 Aug. 1999), ITU-T Recommendation H.225.0 (February 1998), H.245 (Jun. 3, 1999), H.261, H.263, T.120, G.722. G.723, G.728 and G.729.
A block diagram illustrating example prior art H.323 compliant terminal equipment is shown in FIG. 3. The H.323 terminal 50 comprises a video codec 52, audio codec 54, system control 56 and H.225.0 layer 64. The system control comprises H.245 control 58, call control 60 and Registration, Admission and Status (RAS) control 62.
Attached video equipment 66 includes any type of video equipment, such as cameras and monitors including their control and selection, and various video processing equipment. Attached audio equipment 70 includes devices such as those providing voice activation sensing, microphones, loudspeakers, telephone instruments and microphone mixers. Data applications and associated user interfaces 72 such as those that use the T.120 real time audiographics conferencing standard or other data services over the data channel. The attached system control and user interface 74 provides the human user interface for system control. The network interface 68 provides the interface to the packet based network, supports the appropriate signaling and voltage levels in accordance with appropriate standards. Typical packet based networks include Ethernet and Token Ring.
The video codec 52 functions to encode video signals from the video source (e.g., video camera) for transmission over the network and to decode the received video data for output to a video display. If a terminal incorporates video communications, it must be capable of encoding and decoding video information in accordance with H.261. A terminal may also optionally support encoding and decoding video in accordance with other recommendations such as H.263.
The audio codec 54 functions to encode audio signals from the audio source (e.g., microphone) for transmission over the network and to decode the received audio data for output to a loudspeaker. All H.323 terminals must be capable of encoding and decoding speech in accordance with G.711 including both A-law and xcexc-law encoding. Other types of audio that may be supported include G.722, G.723, G.728 and G.729.
The data channel supports telematic application such as electronic whiteboards, still image transfer, file exchange, database access, real time audiographics conferencing (T.120), etc. The system control unit 56 provides services as defined in the H.245 and H.225.0 standards. For example, the system control unit provides signaling for proper operation of the H.323 terminal, call control, capability exchange, signaling of commands and indications and messaging to describe the content of logical channels. The H.225.0 Layer 64 is operative to format the transmitted video, audio, data and control streams into messages for output to the network interface. It also functions to retrieve the received video, audio, data and control streams from messages received from the network interface 68. In addition, it also performs logical framing, sequence numbering and error detection and error correction in accordance with the particular media type.
The gateway functions to convert voice from the packet domain to the circuit switched domain. In particular, it converts packetized voice to a format that can be accepted by the Switched Circuit Network (SCN) such as the PSTN. Packetized voice denotes voice that has been digitized and placed into an Ethernet frame. The gateway provides the appropriate translation between different video, audio and data transmission formats and between different communications procedures.
Note that since the digitization format for voice on the packet network is often different than on the SCN, the gateway needs to provide this type of conversion that is known as transcoding. Note also that gateways also function to pass signaling information such as dial tone, busy tone, etc. Typical connections supported by the gateway include analog, T1, E1, ISDN and ATM at OC-3 and higher rates. Additional functions performed by the gateway include call setup and clearing on both the network side and the Switched Circuit Network (SCN) side. The gateway may be omitted if communications with SCN terminals is not required.
The gatekeeper functions to provide call control services to terminal endpoints on the network. It is primarily designed to control the creation of real time connections over the network. Applications register themselves with the gatekeeper before attempting to bring up a session. The gatekeeper may deny a request to bring up a session or may grant the request at a reduced data rate. This is particularly relevant to video connections that typically consume huge amounts of bandwidth for a high quality connection. The gatekeeper also is responsible for call control, call management and other voice intelligent functions.
The gatekeeper is optional and more than one may be present in the network. In particular, the services provided by the gatekeeper include call control signaling, call authorization, bandwidth management, call management and address translation.
Call control signaling is optional as the gatekeeper may choose to complete the call signaling with the H.323 endpoints and process the call signaling or it may direct the endpoints to connect the call signaling channel directly to each other, thus the gatekeeper avoids handling the H.225.0 call control signals.
Through the use of H.225.0 signaling, the gatekeeper may reject calls from a terminal due to authorization failure. The reasons for rejection may include restricted access to or from particular terminals or gateways, or restricted access during certain time periods.
Bandwidth management entails controlling the number of H.323 terminals that are allowed to simultaneously access the network. Via H.225.0 signaling, the gatekeeper may reject calls from a terminal due to bandwidth limitations. This may occur if the gatekeeper determined that there is not sufficient bandwidth available on the network to support the call.
The call management function performed by the gatekeeper includes maintaining a list of currently active H.323 calls. This information is used to indicate that a terminal is busy and to provide information for the bandwidth management function.
The gatekeeper also provides address translation whereby an alias address is translated to a Transport Address. This is performed using a translation table that is updated using Registration messages, for example.
The H.225.0 standard defines the Real-Time Transport Protocol (RTP) which provides end-to-end delivery services for data with real-time characteristics, such as interactive audio and video. These services include payload type identification, sequence numbering, time stamping and delivery monitoring. Typically, applications run RTP on top of UDP to take advantage of the multiplexing and checksum features of UDP. RTP may, however, be used with other suitable underlying network or transport protocols.
RTP does not itself provide any mechanism to ensure timely delivery or other QoS guarantees, but relies on lower layer services to do so. It also does not guarantee delivery or prevent out of order delivery, nor does it assume that the underlying network is reliable and delivers packets in sequence. The sequence numbers included in RTP allow the receiver to reconstruct the sender""s packet sequence.
RTP is intended to be flexible so as to provide the information required by a particular application. Unlike conventional protocols in which additional functions may be accommodated by making the protocol more general or by adding an option mechanism that required parsing, RTP can be tailored through modifications and/or additions to the headers.
The RTP Control Protocol (RTCP) functions to periodically transmit control packets to all participants in a session. The primary function of RTCP is to provide feedback on the quality of the data distribution that may be useful for the control of adaptive encodings. RTCP also carries a transport level identifier for an RTP source called the canonical name or CNAME. Receivers require the CNAME to associate multiple data streams from a given participant in a set of related RTP sessions. The RTCP protocol can also be used to convey session control information such as participant identification.
Each RTCP packet begins with a fixed header followed by structured elements of variable length.
The voice quality perceived by the user is most affected by transmission delays across the network fabric. The transmission delays cause gaps in the conversation when one party stops talking and the other begins speaking. Delays can be introduced at any number of points in the network. One source of delay is the encoding of voice from analog which is exacerbated by voice compression algorithms that require large numbers of samples of the voice stream before compressing and packetizing the data. A similar delay occurs at the remote end when the voice is converted back to analog.
Additional delays are incurred as the packet propagates through the packet network. Propagation delays can be minimized in cases where the packet remains on a LAN with wire speed switches. There can be considerable delays, however, at the LAN/WAN boundary where Layer 3 switches or other access devices may add potentially large queuing delays created by the disparities in speed between LAN and WAN.
Another problem, known as jitter, is caused by the variation in delay as may be produced in a shared network such as Ethernet. Jitter delays occur primarily because of traffic conditions on the backbone network. In particular, the delays generated as a packet travels from user to user (i.e., network entity to network entity) may be generated due to the statistical nature of packet collisions. Telephone conversations are particularly sensitive to a problem known as head-of-line blocking whereby a voice packet gets stuck in a queue behind other less time critical packets. This happens when all the packets arriving in a switch or other network device are treated with the same priority. Packets following this one, however, may not experience the same delay thus giving rise to an oscillation situation.
In an IP network, the problem is emphasized by the deployment of network elements such as bridges and routers which also suffer from the statistical nature of packet collisions. Routers are network elements that incorporate a number of ports, each connected to a different subnet. Routers generally use ingress queuing to transfer packets from subnet to subnet, which causes packet delays.
Another source of jitter is the lack of a network-synchronized clock. Thus, each node uses its own internal clock which is not synchronized to any of the clocks in other nodes.
Another problem is loss of packets. Most end stations using data connections are designed to accept and recover from an occasional lost packet. The human ear, however, is not very tolerant of lost packets. A user will hear annoying gaps and clicks if the network does not provide techniques such as forward error correction to combat this. The effect on users depends on the packet loss rate and the compression algorithm used. The higher compression schemes are less forgiving.
Therefore, when a packet travels from one point in the network to another via the shared network (e.g., Ethernet), it is normally not possible to predict what the end-to-end delay will be. In a packet telephony application, as more and more calls are established, a congestion problem is likely to ensue as it is impossible to know which packets will go where.
One prior art attempt to solve the unpredictable nature of the jitter delay problem is to assign a priority to the specific stream. The use of priority was intended to address the huge growth in network traffic brought about by the increased use and reliance on business process applications. No matter how fast the underlying network, e.g., 10 Mbps, 100 Mbps or Gigabit Ethernet, congestion may still occur such as when two traffic streams compete for a single output port or when the speed of the incoming traffic exceeds the transmission rate of the outgoing port. Under these conditions, the network element (e.g., switch or router) must buffer some traffic while transmitting the rest. Prioritizing the traffic can help especially when the network carries real-time traffic such as voice or video which must be delivered with sufficient quality of service to be acceptable.
The IEEE 802.1p standard provides a mechanism whereby priority tags can be used to indicate traffic priority. Traffic is assigned one of eight traffic classes values which is carried in a priority field in the packet header. For Ethernet frames, 802.1Q VLAN tags include a three-bit priority indication. The 802.1D LAN bridging standard provides expedited traffic capabilities to support transmission of time critical information in a LAN environment.
Another prior art attempt to solve the problem involves reserving bandwidth for higher priority traffic (e.g., RSVP protocol).
These prior art solutions, however, have the disadvantage that they are either wasteful of bandwidth or they do not solve the congestion/jitter problem when some segments of the path are transmitting packets in sessions that have the same priority. In other words, the problem is how to prioritize (and ultimately differentiate) multiple packets when they all have the same priority.
The present invention solves the problems associated with the prior art by providing an apparatus for and a method of dynamically prioritizing packets over a packet based network. Such a packet network may comprise, for example, an IP based network running over Ethernet, Token ring, etc. The present invention provides a solution to the congestion problem that is not resolved by prior art priority schemes.
The present invention dynamically prioritizes packets on the basis of their xe2x80x98time to livexe2x80x99 in the network. In other words, packets are assigned a priority in accordance with their degree of freshness or staleness. Packets with a relatively long time left are given lower priority then those with relatively little time left before they are of no use. In general, packets generated by real-time multimedia applications such as audio, video, etc. have a finite life span. Once that life span has passed, they are of little value. Thus, in a real-time audio application, for example, packets must arrive at their destinations in sufficient time to be played to the user in analog format. Packets representing a user""s voice corresponding to a point in time earlier than the current time are useless as their window of playback has passed. These packets are of no use to the user as their time has passed and they are thus discarded.
The present invention is operative to add a field to the packet as it travels from entity to entity in the network. The contents of the time to live (TTL) field represents how xe2x80x98youngxe2x80x99 or xe2x80x98oldxe2x80x99 the packet is and conveys the time left before the packet is no longer any use to the application that is to receive it. Each network entity that receives the packet with a TTL field, subtracts from it the time the packet spends passing through that entity. Thus, the TTL field decreases as it hops from network entity to entity.
The source or originator of the packet is operative to set the initial value of the TTL field and place in into the packet. For example, the RTP packet can be used with the TTL information placed in the payload or in a header extension as provided for in the ITU-T Recommendation H.225.0.
In accordance with the invention, network entities maintain one or more queues corresponding to different classes of priorities and/or delay sensitive data streams. The entity is configured with a mapping that converts a TTL value to a corresponding priority. The packet is then placed in the appropriate queue in accordance with its priority. Packets are retrieved from the queues and processed in order of priority. A minimum allocation of processor resources, however, is assigned to each queue.
After processing, the time the packet spent in the entity is calculated and subtracted from the TTL field. The packet is then forwarded to the next hop along the path.
Note that the present invention is effective to dynamically assign a priority to the individual packets within a data stream belonging to a particular session. The priority is dynamically assigned based on the level of congestion experienced by the individual packet along the path. Prior art protocols cannot assign different priorities to individual packets on a dynamic basis. Existing protocols such as RSVP assign priority on a session basis where the same priority is assigned to the entire data stream for that session.
There is provided in accordance with the present invention a method for dynamically prioritizing time sensitive packets for transmission over a packet based network, the method comprising the steps of adding, at an originator of the packet, a time to live (TTL) value field to time sensitive packets wherein the TTL value is set to an initial value, determining, at each network entity along the path, a priority level for each time sensitive packet received, the priority level determined in accordance with the TTL value extracted from the packet, distributing each time sensitive packet to one of a plurality of queues in accordance with the priority associated with the packet and replacing the TTL value field in the packet with the previous TTL value modified to reflect the time the packet spent in the current network element.
There is also provided in accordance with the present invention an apparatus for dynamically prioritizing packets in a network entity according to their sensitivity to time delays comprising means for adding, at an originator of the packet, a time to live (TTL) value field to time sensitive packets wherein the TTL value is set to an initial value, means for determining a priority level for each time sensitive packet received by the network entity, the priority level determined in accordance with the TTL value associated with the packet, a plurality of queues for holding time sensitive packets, the plurality of queues divided into a plurality of groups, each group corresponding to a different priority level, a distributor adapted to place each time sensitive packet into one of the plurality of queues in accordance with the priority associated with the packet and means for replacing the TTL value field in the packet with the previous TTL value modified to reflect the time the packet spent in the network element.