Telephony over packet switched networks, such as IP (Internet Protocol) based networks (mainly the Internet or Intranet networks) has become increasingly attractive due to a number of features. These features include such things as relatively low operating costs, easy integration of new services, and one network for voice and data. The speech or audio signal in packet switched systems is converted into a digital signal, i.e. into a bitstream, which is divided in portions of suitable size in order to be transmitted in data packets over the packet switched network from a transmitter end to a receiver end.
Packet switched networks were originally designed for transmission of non-real-time data and voice transmissions over such networks causes some problems. Data packets can be lost during transmission, as they can be deliberately discarded by the network due to congestion problems or transmission errors. In non-real-time applications this is not a problem since a lost packet can be retransmitted. However, retransmission is not a possible solution for real-time applications. A packet that arrives too late to a real-time application cannot be used to reconstruct the corresponding signal since this signal already has been, or should have been, delivered to the receiving speaker. Therefore, a packet that arrives too late is equivalent to a lost packet.
One characteristic of an IP-network is that if a packet is received, the content of the packet is necessarily undamaged. An IP-packet has a header which includes a CRC (Cyclic Redundancy Check) field. The CRC is used to check if the content of the packet is undamaged. If the CRC indicates an error, the packet is discarded. In other words, bit errors do not exist, only packet losses.
The main problem with lost or delayed data packets is the introduction of distortion in the reconstructed speech or audio signal. The distortion results from the fact that signal segments conveyed by lost or delayed data packets cannot be reconstructed. The speech coders in use today were originally designed for circuit switched networks with error free channels or with channels having bit-error characteristics. Therefore, a problem with these speech coders is that they do not handle packet losses well.
Considering what has been described above as well as other particulars of a packet switched network, there are problems connected with how to provide the same quality in telephony over packet switched networks as in ordinary telephony over circuit switched networks. In order to solve these problems, the characteristics of a packet switched network have to be taken into consideration.
In order to overcome the problems associated with lost or delayed data packets during real-time transmissions, it is suitable to introduce diversity for the transmission over the packet switched network. Diversity is a method which increases robustness in transmission by spreading information in time (as in interleaving in mobile telephony) or over some physical entity (as when using multiple receiving antennas). In packet transmission, diversity is introduced on a packet level by finding some way to create diversity between packets in one embodiment. The simplest way of creating diversity in a packet switched network is to transmit the same packet payload twice in two different packets. In this way, a lost or delayed packet will not disturb the transmission of the payload information since another packet with identical payload, most probably, will be received in due time. It is evident that transmission of information in a diversity system will require more bandwidth than transmission of information in a regular system.
Many of the diversity schemes or diversity systems in the prior art have the disadvantage that the transmission of a sound signal does not benefit from the additional bandwidth needed by the transmitted redundant information under normal operating conditions. Thus, for most of the time, when there are no packet losses or delays, the additional bandwidth will merely be used for transmission of overhead information.
Since bandwidth most often is a limited resource, it would be desirable if a transmitted sound signal somehow could benefit from the additional bandwidth required by a diversity system. It would be desirable if the additional bandwidth could be used for improving the quality of the decoded sound signal at the receiving end in some embodiments.
In “Design of Multiple Description Scalar Quantizers”, V. A. Vaishampayan, IEEE Transactions on Information Theory, Vol. 39, No. 3, May 1993, the use of multiple descriptions in a diversity system is disclosed. The encoder sends two different descriptions of the same source signal over two different channels, and the decoder reconstructs the source signal based on information received from the channel(s) that are currently working. Thus, the quality of the reconstructed signal will be based on one description if only one channel is working. If both channels work, the reproduced source signal will be based on two descriptions and higher quality will be obtained at the receiving end. In the article, the author addresses the problem of index assignment in order to maximize the benefit of multiple descriptions in a diversity system.
In a system that transmits data over packet switched networks, one or more headers are added to each data packet. These headers contain data fields with information about the destination of the packet, the sender address, the size of the data within the packet, as well as other packet transport related data fields. The size of the headers added to the packets constitutes overhead information that must be taken into account. To keep the packet assembling delay of data packets small, the payload of the data packets have limited size. The payload is the information within a packet which is used by an application. The size of the payload, compared to the size of the actually transmitted data packet with its included overhead information, is an important measure when considering the amount of available bandwidth. A problem with transmitting several relatively small data packets, is that the size of the headers will be substantial in comparison with the size of the information which is useful for the application. In fact, the size of the headers will not seldom be greater than the size of the useful information.
To alleviate bandwidth problems, it is desirable to reduce the bit rate by suitable coding of the information to be transmitted. One scheme frequently used is to code information data using predictions of the data. These predictions are generated based on previous information data of the same information signal. However, due to the phenomenon that packets can be lost during transmission, it is not a good idea to insert dependencies between different packets. If a packet is lost and the reconstruction of a following information segment is dependent on the information contained in the lost packet, then the reconstruction of the following information segment will suffer. It is important that this type of error propagation is avoided. Therefore, the ordinary way of using prediction to reduce the bit rate of a speech or audio signal is not efficient for these kinds of transmission channels, since such prediction would lead to error propagation. Thus, there is a problem in how to provide prediction in a packet switched system when transmitting data packets with voice or audio signal information.
The use of prediction is a common method in speech coding to improve coding efficiency, i.e. for decreasing the bit rate. An example is the predictive coding technique for Differential PCM (DPCM) coders disclosed in “Digital Coding of Waveforms: Principles and Applications to Speech and Video”, N. S. Jayant and P. Noll, Prentice Hall, ISBN 0-13-211913-7 01, 1984. The prediction of a signal sample is computed by a predictor based on a previous quantized signal sample, i.e. the prediction is backward adaptive. The computed prediction sample is then subtracted from the original sample which is to be predicted. The result of the subtraction is the error obtained when predicting the signal sample using the predictor. This resulting prediction error is then quantized and transmitted to a receiving end. At the receiver the prediction error is added to a regenerated prediction signal from a predictor corresponding to the predictor at the transmitting end. This combination of the received prediction error with a calculated prediction value will enable a reconstruction of the original signal sample at the receiver end. This kind of coding leads to bit rate savings since redundancy is removed and the prediction error signal has lower power than the original signal, so that less bits are needed for the quantization of the error signal at a given noise level.
As stated above, this kind of encoding/decoding of speech or audio over a packet switched network leads to error propagation if a packet is lost. When a packet is not received, the prediction value calculated in the decoder will be based on samples of the last packet that was received. This will result in a prediction value in the decoder that differs from the corresponding prediction value in the encoder. Thus, the received quantized prediction error will be added to the wrong prediction value in the decoder. Hence, a lost packet will lead to error propagation. If one would consider to reset the prediction state after each transmitted/received packet, there would be no error propagation. However, this would lead to a low quality of the decoded signal. The reason being that if the predictor state is set to zero, the result will be a low quality of the prediction value during encoding and, thus, the generation of a prediction error with more information content. This in turn will result in a low quality of the quantized signal with a high noise level since the quantizer is not adapted to quantize signals with such high information content.
If a diversity system is implemented based on multiple descriptions, the incorporation of prediction will face additional problems which are due to the fact that the sound signal has several representations. If the above described scheme for predictive encoding/decoding is used together with multiple description quantizers, one of two problems will be present. The problem will be dependent on how the predictors are utilized at the transmitting/receiving end.
If each of the multiple description quantizers at the receiving end were to feed independent prediction filters, the prediction value for each description would be independent of the arrival of the other multiple descriptions. However, with this solution the offset of the different encoded representations will be different between different independent predictor outputs. Thereby the regular spacing between representations from the multiple quantizers is lost, and with that the optimized improvement from receiving multiple descriptions is also lost.
Alternatively, all multiple descriptions could be constructed from the same predictor, thereby maintaining the optimized improvement from receiving multiple descriptions. However, if this prediction is from a pre-defined representation, for example, a best representation obtained from a merger of all descriptions, then synchronization of the decoder with the encoder is lost if one (or more) description of the multiple descriptions is not received due to a packet loss when transmitting that description from the encoder at the transmitting end to the decoder at the receiving end.
Thus, as stated above, there is a problem in how to use prediction for reducing the bit rate of a speech or audio signal for transmission over a packet network, since a lost packet with a signal information segment negatively will affect the reconstruction of the following signal information segment.
When using multiple descriptions, the transmission of the sound signal will require more bandwidth than if a single description was used. In such a system, it would be even more interesting to use prediction in order to reduce the required bandwidth. However, as described above, there is a problem in how to implement the predictive encoding/decoding mechanism in such a system, while maintaining the basic gain of multiple description quantization.