The present invention generally relates to voice communication over packet networks, and more specifically relates to a method and apparatus for improving voice quality in voice-over-packet networks.
A typical architecture of a voice-over packet system (focusing only on the voice communication part) is illustrated in FIG. 1. The voice encoders/decoders 10 and 12 shown in FIG. 1 are the most commonly used as per present ITU/T recommendations. However, such details may change over time, and are given in FIG. 1 for illustration purposes only. Many sources are readily available which provide a detailed description of the various components of a voice-over packet system.
Due to the inherent nature of packet-based data communication networks, although the voice-over-packet communication device sends packets to the other end at equal time intervals, when the packets are received from the network, they do not arrive at equal time intervals. The difference in time of arrival of packets is called “jitter.” Sometimes, depending on the network protocol used, and the network conditions, the packets may even arrive in a sequence that is different from the sequence in which they were sent. As shown in FIG. 1, a voice-over packet system typically includes a network jitter compensator or jitter buffer 14. The network jitter compensator 14 temporarily holds the packets received from the network, and, if necessary, makes sure that they are in sequence.
However, sometimes a packet arrives so late, there are no packets left in the jitter buffer that could be played out. In that case, the decoder has to perform the task of “filling-in” by extending the current speech signal further. In a typical system, such an operation of “filling-in” is performed on a frame-by-frame (where the frame size is the length of speech that is encapsulated in a packet) basis. Typically, when the system is implemented on a DSP, there is a DMA and an associated memory buffer that is used to transmit the speech samples to the PCM TELCO interface. Every time the buffer starts getting empty, a packet is decoded, and the corresponding speech is added to the buffer. Once a piece of data is placed in that buffer, it cannot be modified. Therefore, when a packet arrives too late, one full frame is lost. Even if the packet arrives right after the first speech sample for the corresponding packet is being played out at the PCM TELCO interface, the entire packet is discarded.
When CELP coders are used in such system, another problem is presented. CELP coders (and sometimes even waveform coders) typically have a “coder state” associated with them. When the communication channel is loss-less, the encoder state and the decoder state are in synchronization. However, when a packet is lost, and the decoder has to “fill-in” for the lost packet, the state of the decoder changes, and the encoder and the decoder lose synchronization. Thereafter, it may take several frames for the two to regain synchronization, thus resulting in a higher degradation in voice quality. The present invention is directed at solving both of these problems.