The present invention relates to data transmission of streaming data. The invention is particularly suited for voice over packet data networks, for example Voice over Internet Protocol (VoIP) networks.
For VoIP networks, audio signals are digitized into frames and transmitted as packets over an IP network. The transmitter sends these packets at a constant transmission rate. An appropriately configured receiver will receive the packets, extract the frames of digital data and convert the digital data into analog output using a digital to analog (D/A) converter. One of the characteristics of an IP network is that packets will not necessarily arrive at their destination at a constant rate, due to variable delays through the network. However, digital audio data (for example a digitized voice conversation) must be played out at a constant output rate in order to reconstruct the audio signal, and the D/A converter operates at such a constant output rate.
A known solution for this problem is to implement a jitter buffer in the receiver. A jitter buffer stores frames as they are received from the network. After several frames are loaded into the buffer, the frames in the buffer are output at the constant output rate. As long as the average rate of reception of the packets is equal to the constant output rate, the jitter buffer allows the packets to be output at the constant output rate even though they are not necessarily received at a constant rate.
In traditional (e.g., PSTN) digital telephony systems, end points are synchronized by a common master clock in order to ensure that the D/A and A/D converters at both ends operate at the same data rate. In other words, the PSTN is a synchronous network, and thus the constant transmission rate is the same as the constant output rate. However in a packet based system, there is no common clock to ensure synchronization of the data rates. Thus the two endpoints will typically have marginally different data rates. Thus the constant output rate from the jitter buffer will differ from the far-end constant transmission rate.
For example, let us assume that the clock rate of the A/D converter of the far-end transmitter is slightly faster than the clock rate of the D/A converter of the receiver. This will result in the far end transmitter sending digital samples of audio data at a rate faster then the local receiver will be converting the digital samples into analog. This will result in a output rate of the jitter buffer which is slower than the far-end transmission rate. Eventually this will result in the jitter buffer becoming full. In traditional jitter buffer designs, this will result in a random discard of a frame, which degrades audio quality.
Thus, while known jitter buffer techniques can compensate for variable transmission delays through the network (provided the average rate of reception is equal to the constant output rate), the jitter buffer can be either depleted or filled to capacity due to a rate mismatch between the far-end transmitter and the local receiver.
There exists a need to overcome this problem.
An object of the present invention is to provide a system which monitors the jitter buffer in order to determine conditions when a frame will need to be deleted or inserted. When such a condition exists, the system intelligently selects frames for insertion or deletion based on a criteria which reduces the impact of such an insertion/deletion. Thus, the system includes a detector for detecting frames which satisfy a criteria indicative of the impact of the insertion or deletion of a frame.
For example, for a voice conversation, there are silence frames which result from inherent gaps in speech. A silence frame is better to delete than a frame with actual speech content. Furthermore, if a frame needs to be inserted, it is better to insert a silence frame immediately after a silence frame than between two content frames. Thus in one embodiment, the criteria includes the detection of silence, and the detector includes a silence detector (for example a Voice Activity Detector (VAD) or envelope detector) to detect silent frames. Such a system extends silence intervals, by detecting silence and inserting silence frame(s) at head of jitter buffer, when the jitter buffer is depleted below a depletion threshold (called the low water mark). Similarly when the jitter buffer is filled beyond a filled threshold (called the high water mark), the system deletes silence frames. Note that in this specification silence (or a silent frame) can include background and/or comfort noise.
Another example of criteria indicative of the impact of the insertion or deletion of a frame includes whether a frame is received with errors. Another criteria includes information associated with received frames about the mode of operation of the apparatus when the apparatus is using some kind of echo control or switched loss algorithm (for example, during handsfree operation). For example an indication that a frame is received while a terminal is transmitting (i.e. near end talking) or in quiescent mode would indicate that such a frame can be deleted with minimal audible impact.
In accordance with a broad aspect of the present invention there is provided a method of managing a jitter buffer comprising the steps of:
receiving frames from a data network;
storing received frames into said jitter buffer;
detecting frames which satisfy a criteria; and
controlling the frames stored in said jitter buffer based on the condition of said buffer and on frames which satisfy said criteria.
According to a further aspect of the invention, said condition comprises a high water mark and a low water mark and wherein said controlling step comprises:
deleting a frame from said jitter buffer when the high water mark is exceeded and when said criteria is satisfied; and
inserting a frame into said jitter buffer when said buffer is depleted below said low water mark and when said criteria is satisfied.
In accordance with another broad aspect of the present invention there is provided Apparatus comprising:
a data interface for receiving frames from a data network
a jitter buffer for temporarily storing said frames;
a detector for detecting frames which satisfy a criteria; and
a buffer manager for controlling the frames stored in said jitter buffer based on the condition of said buffer and on frames which satisfy said criteria.