Telephony over packet switched networks, such as IP (Internet Protocol) based networks (mainly the Internet or Intranet networks) has become increasingly attractive due to a number of features. These features include such things as relatively low operating costs, easy integration of new services, and one network for voice and data. The speech or audio signal in packet switched systems is converted into a digital signal, i.e. into a bitstream, which is divided in portions of suitable size in order to be transmitted in data packets over the packet switched network from a transmitter end to a receiver end.
Packet switched networks were originally designed for transmission of non-real-time data and voice transmissions over such networks causes some problems. Data packets can be lost during transmission, as they can be deliberately discarded by the network due to congestion problems or transmission errors. In non-real-time applications this is not a problem since a lost packet can be retransmitted. However, retransmission is not a possible solution for real-time applications. A packet that arrives too late to a real-time application cannot be used to reconstruct the corresponding signal since this signal already has been, or should have been, delivered to the receiving speaker. Therefore, a packet that arrives too late is equivalent to a lost packet.
One characteristic of an IP-network is that if a packet is received the content of packet is necessarily undamaged. An IP-packet has a header which includes a CRC (Cyclic Redundancy Check) field. The CRC is used to check if the content of the packet is undamaged. If the CRC indicates an error, the packet is discarded. In other words, bit errors do not exist, only packet losses.
The main problem with lost or delayed data packets is the introduction of distortion in the reconstructed speech or audio signal. The distortion results from the fact that signal segments conveyed by lost or delayed data packets cannot be reconstructed. The speech coders in use today were originally designed for circuit switched networks with error free channels or with channels having bit-error characteristics. Therefore, a problem with these speech coders is that they do not handle packet losses well.
Considering what has been described above as well as other particulars of a packet switched network, there are problems connected with how to provide the same quality in telephony over packet switched networks as in ordinary telephony over circuit switched networks. In order to solve these problems, the characteristics of a packet switched network have to be taken into consideration.
In a system that transmits data over packet switched networks, one or more headers are added to each data packet. These headers contain data fields with information about the destination of the packet, the sender address, the size of the data within the packet, as well as other packet transport related data fields. The size of the headers added to the packets constitutes overhead information that must be taken into account. To keep the packet assembling delay of data packets small, the payload of the data packets have limited size. The payload is the information within a packet which is used by an application. The size of the payload, compared to the size of the actually transmitted data packet with its included overhead information, is an important measure when considering the amount of available bandwidth. A problem with transmitting several relatively small data packets, is that the size of the headers will be substantial in comparison with the size of the information which is useful for the application. In fact, the size of the headers will not seldom be greater than the size of the useful information.
To alleviate bandwidth problems, it is desirable to reduce the bit rate by suitable coding of the information to be transmitted. However, the advantage of the bit rate reduction by coding is less significant, and the bandwidth still a problem, if a very large overhead in the form of a header is added to the application information before transmission of the data packet.
One scheme frequently used for reducing the bit rate is to code information data using predictions of the data. These predictions are generated based on previous information data of the same information signal. However, due to the phenomenon that packets can be lost during transmission, it is not a good idea to insert dependencies between different packets. If a packet is lost and the reconstruction of a following information segment is dependent on the information contained in the lost packet, then the reconstruction of the following information segment will suffer. It is important that this type of error propagation is avoided. Therefore, the ordinary way of using prediction to reduce the bit rate of a speech or audio signal is not efficient for these kinds of transmission channels, since such prediction would lead to error propagation. Thus, there is a problem in how to provide prediction in a packet switched system when transmitting data packets with voice or audio signal information.
In order to overcome the problems associated with lost or delayed data packets during real-time transmissions, it is suitable to introduce diversity for the transmission over the packet switched network. Diversity is a method which increases robustness in transmission by spreading information in time (as in interleaving in mobile telephony) or over some physical entity (as when using multiple receiving antennas). In packet transmission for one embodiment, diversity is introduced on a packet level by finding some way to create diversity between packets. The simplest way of creating diversity in a packet switched network is to transmit the same packet payload twice in two different packets. In this way, a lost or delayed packet will not disturb the transmission of the payload information since another packet with identical payload, most probably, will be received in due time. A disadvantage with this is that it is not very efficient in terms of bandwidth since the network or channel is loaded with twice the amount of information.
An example of the use of diversity for decreasing the impact of packet loss on audio quality in Internet telephony applications is disclosed by Bolot, S. et. al. in “Adaptive FEC-Based Error Control for Interactive Audio in the Internet”, IEEE Infocom '99, New York, USA, March 1999. Bolot describes how Forward Error Correction (FEC) schemes are used for creating diversity. In these FEC schemes, a redundant version of an audio packet is transmitted along with the original information of a later packet. If a packet with original information is lost, the redundant information in a later packet can be used for partly reconstructing the samples representing the original information. This is achieved by coding the signal with a low rate coder (much lower rate than the original coder) and transmitting this lower rate signal as redundant information. There are however a number of disadvantages with this solution. The complexity of the coding system will be increased since an additional and different coding scheme will be needed for the redundant information. Also, the coder will be more hardware demanding in order to give reasonable quality at the lower rate. Furthermore, the receiving end will correspondingly need two different types of decoders, and, in case of packet loss, to be able to seamlessly reproduce speech based on interleaved information from the two different types of decoders.
The above-mentioned diversity schemes or diversity systems have the disadvantage that the transmission of a sound signal does not benefit from the additional bandwidth needed by the transmitted redundant information under normal operating conditions. Thus, for most of the time, when there are no packet losses or delays, the additional bandwidth will merely be used for transmission of overhead information.
Since bandwidth most often is a limited resource, it would be desirable if a transmitted sound signal somehow could benefit from the additional bandwidth required by a diversity system. In one embodiment, it would be desirable if the additional bandwidth could be used for improving the quality of the decoded sound signal at the receiving end.
In “Design of Multiple Description Scalar Quantizers”, V. A. Vaishampayan, IEEE Transactions on Information Theory, Vol. 39, No. 3, May 1993, the use of multiple descriptions in a diversity system is disclosed. The encoder sends two different descriptions of the same source signal over two different channels, and the decoder reconstructs the source signal based on information received from the channel(s) that are currently working. Thus, the quality of the reconstructed signal will be based on one description if only one channel is working. If both channels work, the reproduced source signal will be based on two descriptions and higher quality will be obtained at the receiving end. In the article, the author addresses the problem of index assignment in order to maximize the benefit of multiple descriptions in a diversity system.
In EP 0 856 956 A1, a multiple description coding communication system for image coding is disclosed. The invention uses transform coding where pairs of coefficients are transformed with a pairing transform to get a new pair of coefficients with substantially equal energy. These coefficients are coded separately and transmitted in different packets. In this way, information of both the original coefficients are in both packets and robustness to loss of one packet is obtained since the inverse pairing transform will produce two coefficients from one received, however with less resolution. A disadvantage with this system is that the efficiency will be low since two different types of quantizers are used to complement each other for redundancy purposes only, and not for improving the image quality when receiving both coefficient pairs.
Thus, in connection with transmission of a sound signal over a packet switched network, the problem to be solved is how to implement a diversity system that uses multiple description, provides good operating characteristics, is bandwidth efficient, and keeps the complexity low.