In a communication system a communication network is provided, which can link together two communication terminals so that the terminals can send information to each other in a call or other communication event. Information may include speech, text, images or video.
Modern communication systems are based on the transmission of digital signals. Analogue information such as speech is input into an analogue to digital converter at the transmitter of one terminal and converted into a digital signal. The digital signal is then encoded and placed in data packets for transmission over a channel to the receiver of another terminal.
One type of communication network suitable for transmitting data packets is the internet. Protocols which are used to carry voice signals over an Internet Protocol network are commonly referred to as Voice over IP (VoIP). VoIP is the routing of voice conversations over the Internet or through any other IP-based network.
A data packet includes a header portion and a payload portion. The header portion of the data packet contains data for transmitting and processing the data packet. This information may include an identification number and source address that uniquely identifies the packet, a header checksum used to detect processing errors and the destination address. The payload portion of the data packet includes information from the digital signal intended for transmission. This information may be included in the payload as encoded frames such as voice frames, wherein each frame represents a portion of the analogue signal.
Degradations in the channel on which the information is sent will effect the information received at the receiving terminal. Degradations in the channel can cause changes in the packet sequence, delay the arrival of some packets at the receiver and cause the loss of other packets. The degradations may be caused by channel imperfections, noise and overload in the channel. This ultimately results in a reduction of the quality of the signal output by the receiving terminal.
In order to ensure that the data in the data packets may be output continuously at the destination terminal, it is necessary to introduce a delay between receiving a data packet and outputting the data in the packet, in order to over come random variations in the in the delay between packets arriving at the terminal.
A jitter buffer is used at the receiving terminal to introduce a delay between receiving data packets from the network and outputting the data from the terminal. The jitter buffer stores packets or frames temporarily to cope with the variations in the arrival times of packets, such that the jitter buffer may continuously provide frames to be output to a decoder.
A jitter buffer manager is arranged to control the amount of frames in the jitter buffer over time. The jitter buffer manager may control the number of frames in the jitter buffer, thereby adjusting the delay introduced by the jitter buffer, by requesting that the decoder performs an action that will affect the time at which the decoder requires the next frame from the jitter buffer.
In order to delay the time that the decoder requires the next frame, the jitter buffer manager may be arranged to request that the decoder inserts a copy of the last frame or extents the play out time of a frame, for example by stretching the length of the frame from 20 ms to 30 ms. Conversely in order to reduce the time that the decoder requires the next frame, the jitter buffer manager may be arranged to request that the decoder skips a frame or shortens the play out time of a frame, for example by compressing the length of the frame, from 20 ms to 10 ms. If however the delay introduced by the jitter buffer does not need to be altered the jitter buffer manager may request that the decoder decodes the frame without modifying the signal.
Simple jitter buffers introduce a delay by adapting the delay such that a predetermined number of packets or frames are held the jitter buffer. However it is advantageous to adapt the number of packets held in the buffer to effectively handle changing network conditions. Therefore, in some methods known in the art, a target number of frames to be stored in the jitter buffer may be calculated adaptively.
Altering the time at which the decoder takes the next frame from the jitter buffer by the above described methods will often result in a distortion of the output signal, e.g. resulting from stretching or compression of frames. Loss and jitter concealment (LJC) methods are designed to minimize the distortion caused by adapting the delay. The operation of a jitter buffer and an LJC unit will now be described with reference to FIG. 2.
FIG. 2 shows an example of receiving circuitry 10 in a terminal used to receive data packets transmitted from the network 104, according to the prior art. The receiving circuitry includes a jitter buffer block 12, a decoder block 14, an LJC unit 15 and a digital to analogue converter.
The jitter buffer block 12 receives data packets from the network 104. The jitter buffer block 12 comprises a jitter buffer storage arranged to temporarily store data packets received from the network, and a jitter buffer manager that is arranged to determine the action required by the decoder block 14. The required action is reported to the decoder block 14 as shown by connection 22.
The decoder block 14 receives data provided in the payload of the data packets in the form of a bit stream output from the jitter buffer block 12, as shown by connection 20. The decoder block 14 decodes the bit stream according to the applied encoding scheme.
The parameters of the signal are analysed to determine the presence of voice activity on the signal. From this the LJC unit 15 is arranged to determine if the action output from the jitter buffer block on connection 22 can be applied to the signal in the decoder. Typically actions that adjust the delay introduced at the jitter buffer are preferred during periods of silence so that modifications to the delay are less audible in the signal. However the delay may also be adjusted during active voice periods by analysing the parameters in the signal that indicate the type of voice data in the signal. As an example, it is known that adjusting the delay during stable speech sounds such as ‘s’ sound in ‘sing’ or the ‘a’ sound in ‘car’ causes less distortion to the signal than during unstable plosive speech sounds such as the ‘k’ sound in ‘kid’. In some known methods the response of the decoder or LJC unit to the required action is reported by the decoder block 14 to the jitter buffer block 12 as shown by the connection 24.
It should be noted that the action may be carried out in the decoder 14 or in the LJC unit 15. This is an implementation issue.
The delayed signal is output via the decoder 14 as a decoded digital signal to the digital to analogue converter 16. The digital to analogue converter 16 converts the decoded digital signal to an analogue signal. The analogue signal may then be output by an output device such as a loudspeaker.
Controlling the delay in accordance with type of voice data in the signal to minimise the distortion in the signal reduces the degree to which the delay introduced to the received signal can be adjusted in accordance with the changing network conditions. This can cause problems, such as missing data and perceptual artefacts in the concealment or unnecessarily high buffering delay, especially when the conditions of the network change rapidly.
It is therefore an aim of the present invention to improve the perceived quality of the received signal. It is a further aim of the present invention to provide a method of improving the quality of the received signal without the use of complex computational methods.