The present invention relates to techniques for processing time constrained signals received via asynchronous links.
It relates more particularly to a mechanism that can be implemented in the case of asynchronous transmission with high temporal jitter, for example in the case of communications over networks operating according to the Internet protocol (IP). This mechanism is embedded into terminals, bridges, gateways, and more generally into any element of the network capable of intervening on the data transported.
More specifically, the invention applies particularly in any equipment receiving streams (audio, video and/or data) consisting of packets sent regularly and furnished with a memory organized in first-in first out mode (FIFO) so as to account for the network jitter phenomenon. This is the case in particular in respect of terminals supporting voice over IP (VOIP), which all incorporate a FIFO-managed buffer memory to absorb in particular the network jitter and which interexchange audio data streams regularly through IP packets transported by means of the UDP protocol (“User Datagram Protocol”). For example, two terminals communicating by means of the G.723.1 speech coder as standardized by the International Telecommunications Union (ITU-T) conventionally exchange 24 bytes of audio data every 30 milliseconds.
The invention supplements or adapts the already used conventional mechanisms for FIFO management.
The invention is particularly aimed at interactive applications of voice communication type for example, but may also be of benefit in less interactive applications such as in particular reading in transit (“streaming”).
In any packet mode asynchronous communication, the network introduces a fixed delay as well as a variable delay called “network jitter”. The reception of the packets that pass through the network is delayed with respect to their instant of transmission. The fixed delay, if it remains small, is not the most constraining. Its effect is essentially felt as posing a problem of interactivity in the communication. Network jitter is more of a nuisance since it gives rise on the one hand to voids (lack of signal to be restored, the packet arriving too late) and on the other hand, at other moments, to an overabundance of packets to be restored (simultaneous arrival of several consecutive packets forming a burst), this possibly introducing a further delay that is detrimental to the interactivity of the communication, for example in the case of VOIP.
A mechanism whereby these variations in transmission delay can be managed to a certain extent must therefore be introduced on reception. This mechanism is to be placed at the receiver end and not at the transmitter, since the latter transmits packets periodically, the variations being introduced by the asynchronous network. As this network cannot be controlled either by the transmitter or by the receiver, it is necessary to accommodate its nondeterministic and nonpredictable behavior.
The mechanism generally used to control the jitter phenomenon is the implementation of a FIFO that makes it possible to compensate for the delays of the packets received at the restoring system.
The packets may possibly be received in a different order from that in which they were transmitted. This phenomenon, called “desequencing”, is due to the fact that the packets sent travel independently over the IP networks. Nevertheless, this is a relatively rare phenomenon on the Internet (probability of the order of 0.01%). The real-time protocols employed make it possible, by virtue of a sequence number allocated to each packet, to put them back into the right order on reception, or else to destroy the desequenced packets if their instant of restoration has past. Such is the case in particular with the RTP protocol (“Real Time Protocol”) described in RFC (“Request for Comments”) 1889 and 1890 published in January 1996 by the IETF (“Internet Engineering Task Force”).
The aforesaid FIFO may be located at various places within the reception chain.
FIGS. 1 and 2 show a VOIP-type receiver comprising a network interface 1, customarily consisting of a modem or a network card, linked to a module 2 implementing the IP, UDP and RTP protocols for receiving IP packets and extracting their content (“depacketization” operation). This content is fed to a speech decoder 3 corresponding to the coder (G.723.1 or the like) used by the transmitter and effecting the digital decompression of the audio signal. The speech is ultimately restored by means of a sound card 4 provided with a restore buffer and with a loudspeaker 5.
In the configuration illustrated by FIG. 1, the FIFO 6 is situated between the depacketization module 2 and the decoder 3. In the configuration illustrated by FIG. 2, it is situated between the decoder 3 and the sound card 4. This FIFO 6 is associated with a control module 7 which implements the network jitter compensation algorithms.
FIGS. 3 and 4 illustrate other possible environments of the jitter FIFO 6, other than in a receiver restoring the speech transmitted. In practice, the possible configurations are very numerous.
The diagram of FIG. 3 does not comprise an audio decoder. It corresponds for example to a gateway placed between the IP-type asynchronous network and a synchronous network. The information stream read from the FIFO 6 is provided to the network interface 8 which shapes it for transmission over the synchronous network.
The exemplary item of equipment represented in FIG. 4 ensures a transcoding function, for example between a G.723.1 compression on the asynchronous network and a higher bit rate coding such as G.711 on a local area network (LAN). The jitter FIFO 6 can be placed after the G.723.1 type decoder 3, as represented, or before the latter. The decoded audio stream read from the FIFO 6 is recoded by the G.711-type coder 9 and then provided to the LAN interface circuit 10.
The position of the FIFO can have an influence on the management of the latter. Specifically, in the case of VOIP for example, when the FIFO is placed before the decoder (FIG. 1 or 3), the algorithm cannot access the signal itself since it then only has a coded version of the parameters characterizing the signal. When the FIFO is placed after the decoder (FIG. 2 or 4), it is then possible to adapt the management of the FIFO to the decoded signal that it contains.
One of the first jitter management techniques which was proposed consists in the use of a fixed threshold: when the FIFO is full and a packet arrives, the latter cannot be incorporated into the FIFO, thereby causing its destruction. It is then the size of the FIFO which imposes the maximum delay that it can absorb. It is also this size which makes it possible to define the interactivity/loss compromise.
The network jitter can take relatively high values (for example 300 ms). If the FIFO is dimensioned to be able to absorb a smaller maximum jitter, then when the jitter of a packet exceeds this limit, the restoring system detects an absence of packet, and must alleviate the lack of signal by generating a replacement signal chunk corresponding to a packet (for simplicity of expression, it will be said to generate a replacement or substitution packet for the missing packet, even if the precise mechanism used does not generally comprise the actual production of a packet). This absence of packet is therefore managed in the same way as a loss of the packet from the network. The duration of the signal stored in the FIFO is thus an important parameter in respect of interactivity and quality of communication. Too large a FIFO reduces interactivity but preserves quality, too small a FIFO improves interactivity but may degrade quality by entailing the too frequent generation of replacement packets.
When the FIFO is full, another possibility is to retain the new packets but to destroy the older packets already present in the FIFO. This scheme for emptying the FIFO impairs the quality of the signal in an equivalent manner. However, this fairly blunt scheme is commonly used in practice, so as to favor the thread of the communication by updating the FIFO with the most recent data. The emptying may be total or partial. In the latter case, the joint use of a VAD technique (Voice Activity Detection) allows judicious deletion of the signal frames comprising only background noise. Likewise, in certain embodiments, the FIFO emptying decision can be taken before it is full.
Whichever techniques, more or less complex, are adopted to manage the jitter FIFO, the latter has a finite size and is exposed to the following problem.
It has been noted that on networks with non-guaranteed quality of service having relatively large jitter, e.g. the Internet, bursts of packets were frequent and sometimes of very considerable size. In the case considered of the regular transmission of packets, when a packet is held in a router of the IP network for a time greater than the transmission period, a certain number of packets may accumulate in this same router and be released almost instantaneously together with the oldest packet. The bigger the holding time in the router, the bigger the size of the burst will be. This size may then be greater than the finite size of the FIFO and hence give rise to saturation of the latter.
This saturation phenomenon is managed by the aforesaid FIFO management mechanisms, either by no longer allowing the FIFO to be written to when it is full, or by performing a partial or global emptying of the FIFO so as to be able to continue to write thereto.
In the first case, the FIFO is full at the end of the burst, and therefore introduces a maximum delay into the communication.
In the second case, it has been possible to delete a significant quantity of signal, and the filling degree of the FIFO is in a state dependent on the size of the burst and that of the FIFO. Specifically, let us take the example of a burst of size just less than twice the size of the FIFO. Immediately upon receipt of the first part of this burst in the FIFO, an emptying of the latter is performed, and therefore the second part is placed wholly in the FIFO. One is then in a state much like the first case, with a maximum delay in the communication. If on the contrary the size of the burst had been equal to or slightly greater than the size of the FIFO, the latter would have undergone emptying around the end of reception of this burst and would then be empty, i.e. in a state where the least delay for the next packet gives rise to a problem. Specifically, this small delay then requires the generation of a replacement packet, with a resulting impairment of quality, although valid packets have just been deleted.
In all cases, the appearance of bursts of excessive size gives rise to a degradation in the quality of the communication. In certain cases, it is accompanied by a hefty increase in the delay in the communication, and therefore a hefty degradation of the interactivity of the latter, this state possibly being longer or shorter depending on the jitter FIFO management mechanisms set in place and the type of transmission.
The known mechanisms for managing this jitter FIFO do not comprise particular procedures for managing such bursts of excessive size. They merely manage saturations of the FIFO, by occasionally providing for states of transition to this saturation, in an a posteriori examination of its degree of fill.
An object of the present invention is to limit the inevitable degradation in quality due to the network jitter phenomenon, and in particular to temporally limit the impact of the strong disturbances caused by the network.