The Internet has long been usable for Internet file transfers and e-mail by packet switched communication. A different technology called circuit switched communication is used in the PSTN (public switched telephone network) wherein a circuit is dedicated to each phone call regardless of whether the circuit is being communicated over in silent periods. Packet switched networks do not dedicate a channel, thereby sharing a pipe or channel among many communications and their users. Packets may vary in their length, and have a header for source information, destination information, number of bits in the packet, how many items, priority information, and security information. A packet of data often traverses several nodes as it goes across the network in “hops.” In a stream of data, the packets representative thereof may, and often do, take different paths through the network to get the destination. The packets arrive out of order sometimes. The packets are not only merely delayed relative to the source, but also have delay jitter. Delay jitter is variability in packet delay, or variation in timing of packets relative to each other due to buffering within nodes in the same routing path, and differing delays and/or numbers of hops in different routing paths. Packets may even be actually lost and never reach their destination. Delay jitter is a packet-to-packet concept for the present purposes, and jitter of bits within a given packet is a less emphasized subject herein.
Voice over Packet (VOP) and Voice over Internet Protocol (VoIP) are sensitive to delay jitter to an extent qualitatively more important than for text data files for example. Delay jitter produces interruptions, clicks, pops, hisses and blurring of the sound and/or images as perceived by the user, unless the delay jitter problem can be ameliorated or obviated. Packets that are not literally lost, but are substantially delayed when received, may have to be discarded at the destination nonetheless because they have lost their usefulness at the receiving end. Thus, packets that are discarded, as well as those that are literally lost, are all called “lost packets” herein except where a more specific distinction is made explicit or is plain from the context.
The user can rarely tolerate as much as half a second (500 milliseconds) of delay, and even then may avoid using VOP if its quality is perceptibly inferior to other readily available and albeit more expensive transmission alternatives. Such avoidance may occur with delays of 250 milliseconds or even less, while Internet phone technology hitherto may have suffered from end-to-end delays of as much as 600 milliseconds or more.
Hitherto, one approach has stored the arriving packets in a buffer, but if the buffer is too short, packets are lost. If the buffer is too long, it contributes to delay.
VOP quality requires low lost packet ratio measured in a relatively short time window interval (length of oral utterance for instance, with each packet representing a compressed few centiseconds of speech). By contrast, text file reception can reorder packets during a relatively much longer window interval of reception of text and readying it for printing, viewing, editing, or other use. Voice can be multiplexed along with other data on a packet network inexpensively over long distances and internationally, at low expense compared with circuit-switched PSTN charges.
A Transport Control Protocol (TCP) sometimes used in connection with the IP (Internet Protocol) can provide for packet tags, detection of lost and out-of-order packets by examination of the packet tags and retransmission of the lost packets from the source. TCP is useful for maintaining transmission quality of e-mail and other non-real-time data. However, the delay inherent in the request-for-retransmission process currently may reduce the usefulness of TCP and other ARQ (automatic retransmission request) approaches as a means of enhancing VOP communications.
RTP (Real Time Transport Protocol) and RTCP (RTP Control Protocol) add time stamps and sequence numbers to the packets, augmenting the operations of the network protocol such as IP.
For real-time communication some solution to the problem of packet loss is imperative, and the packet loss problem is exacerbated in heavily-loaded packet networks. Also, even a lightly-loaded packet network with a packet loss ratio of 0.1% perhaps, still requires some mechanism to deal with the circumstances of late and lost packets.
A conventional speech compression process has a portion that samples, digitizes and buffers speech in a frame buffer in frame intervals (e.g. 20 milliseconds), or frames, and another portion that compresses the sampled digitized speech from one of the frames while more speech is being added to the buffer. If the speech is sampled at 8 kiloHertz, then each 20 millisecond example frame has 160 analog speech samples (8.times.20). If an 8-bit analog to digital converter (ADC) is used, then 1280 bits (160.times.8) result as the digitized form of the sampled speech in that 20 millisecond frame. Next the compression process converts the 1280 bits to fewer bits carrying the same or almost the same speech information. Suppose the process provides 8:1 compression. Then 1280/8 bits, or 160 bits of compressed or coded speech result from compression. The compressed speech is then put in the format of a packet, thus called packetized, by a packetizer process.
For every frame of compressed speech in a packet, loss of that packet means loss of each frame in that packet. There then arises the problem how to create 160 bits or more of lost compressed speech. Reduction of packet loss and late packet handling strategy are very important challenges in advancing VOP technology.
Telephony represents a duplex channel. In the case of packet telephony one side (the ingress side) receives voice or digitized voice (PCM data) and produces packets by using any of several compression processes. This ingress side is almost completely ‘synchronous’. Voice is changed into frames. The size of the frames for a given data compression process is fixed. Thus the appearance of frames in the system is both clock-like, and fully predictable. The time of execution of a task that compacts the “PCM data” frames into packets (the frame tasks) is known. The appearance of the packets on the output is both predictable as well as quasi-periodic.
On the other side (the egress side) of packet telephony the packets are converted to PCM frames, which (frames) are added to output buffers for each channel. The packets arrive at rate for which only the average if known. This average depends on the process used and thus on the frame size to be produced. The data from the output buffer is output at a constant rate. If not replenished in time, the data will run out at some 10 msec boundary.
Each packet may be considerably off ‘its’ ideal position in the timeline. Since the density of arrival of packets is only known statistically, the egress side becomes essentially asynchronous. Yet each packet must meet its deadline or be thrown away.