1. Field of the Invention
This invention relates to a voice decoding device capable of effectively reproducing voice information which is compression-coded in a predetermined frame unit and packet-transmitted.
2. Description of the Related Art
Recently, packet transmission is attracting notice as a high-efficiency information transmission method. In the communication of voice information too, there is an attempt to perform a high-efficiency communication with a packet communication network.
Now, at packet communication in the usual data transmission, in order to deal with the outstripping of packet, i.e., the exchange of packet order, due to the difference in transmission channels within the network, a measure is taken, such as the rearrangement of packet order by buffering, the retransmission of packet or the like.
At the packet transmission of voice signals, the naturalness of conversation is more important than the correctness of transmitted information. Hence, when exchange in the order of packets occurs and excessive delay is needed to rearrange into regular order, a processing is performed such that packet data are decoding-processed without temporal exchange by discarding one of the exchanged packets, and sound signals are reproduced. When a packet is thus discarded, however, a discontinuous part is produced in the reproduced voice signal waveform caused by the absence of a packet due to the discard, and inconveniences occur such that an unconfortable sound is generated at the discontinuous part, and the clearness of the reproduced voice is decreased, or the like.
Accordingly, in the conventional system, for example as shown in FIG. 1, the voice signal X(n) sampled at a predetermined period is frame-decomposed at every M points, and the voice signal X(n) at each frame, is sequentially extracted one by one over continuous L frames to produce packets, and these packets are transmitted. That is, when the above-described voice signal X(n) is indicated for each frame as: EQU Xf(l,m)=X(1M+m),
where 1 (0.ltoreq.1&lt;L) is the frame number, m (0.ltoreq.m&lt;M) is the data index within each frame, M-sets packet data Xf (1,m) to be transmitted in packet are obtained as follows: EQU {X(0,0), X(1,0), - - - X(L-1,0)} 1 EQU {X(0,1), X(1,1), - - - X(L-1,1)} 2 EQU {X(0,M-1), X(1,M-1), - - - X(L-1,M-1)} M
At the reception side (decoding device), the data Xf (1,m) thus packet-transmitted are rearranged relative to the M packets, the series of the above-described voice data X (1,m) are decoded, and then the voice signals thereof are reproduced.
By taking such measures, even when, for example, the absence of a packet (the packet 3 in this example) occurs in a part of the data, the omission in the voice signal X(n) in the reproduced data frame is only one sample at each frame as shown in FIG. 1, and hence it is possible to supplement the influence of omission by interpolation or the like from the preceding and succeeding data. As a result, it becomes possible to maintain the quality of the packet-transmitted sound, and also to prevent the occurrence of unconfortable sound described above.
In packet transmission, however, there exists an overhead, such as the reception-side-addressing head, and so the length of a packet cannot be too short from the viewpoint of transmission efficiency. Moreover, in order to adopt the above-described technique, it is necessary to set the number L of sound frames to be relatively large. This indicates that it is necessary to store voice data over L frames at packet transmission. Hence, a large amount of time delay inevitably occurs before the input voice is packet-transmitted, and also before the received packets are decoded to reproduce sound signals.
Moreover, in such a method, the transmission of voice packets is only applicable to the compression coding (the compression ratio is not more than 1/2) of the information in which the transmitted data have the same meaning within a frame, such as ADPCM, ADM or the like. Furthermore, even when the conventional method is applied to the predictive residual signals, the interpolation gain of the predictive residual signal is small, and the deterioration of decoded sound is not negligible.
On the other hand, it is necessary to consider the case that a frame configuration as shown in FIG. 2 is adopted, and the voice information is compression-coded in frame unit and packet-transmitted. By adopting such a configuration, a high-efficiency compression coding for each frame becomes possible, and, for example, it is possible to realize a compression coding having a compression ratio of larger than 4 in frame unit. However, in the packet transmission of voice data in which such a frame processing is performed, each packet has information which has a different meaning for each field. Hence, there is a problem such that even when the absence of a packet occurs, it is impossible to take the above-described measures, such as interpolation or the like.
As described above, with the conventional packet transmission of voice, there exist various problems, such as the occurrence of uncomfortable sound due to the absence of a packet, the delay time from the input of packet data to the decoding and output thereof, the impossibility of taking measures against the absence of a packet for compression coding in which frame processing is performed, or the like.
The present invention takes into consideration such circumstances. It is an object of the present invention to provide a highly-practical voice decoding device which is capable of effective packet transmission of voice signals without causing the problems of the absence of a packet or delay time.