In general, it is often the case that transmission rate is used as a criterion for evaluating the quality of access networks. Classifying networks based on transmission band knows band assurance type and best effort type ones. In a band assurance type network, a network operator always assures users of a constant transmission band regardless of the number of users, while in a best effort type network not always assure users of a constant transmission band.
In recent years due to the explosive growth of the Internet, VoIP (Voice over IP) communication system, in which voice packets containing packetized voice data are transmitted via an IP (Internet Protocol) network, has been introduced. Here, since IP network is a best effort type one, the transmission band of voice packets transmitted from a transmitting terminal to a receiving terminal is not assured and there may occur a transmission delay fluctuation due to, for example, congestion in transmission channel. That is, voice packets transmitted regularly from the transmitting terminal may arrive at the receiving terminal irregularly.
Meanwhile, the receiving terminal accumulates received voice packets sequentially into a receive-packet buffer (receive buffer), and then a voice reproducing device provided in the receiving terminal receives data for one voice packet to reproduce the voice. An example of data for one voice packet is N (representing a natural number) samples of PCM (Pulse Coded Modulation) data.
Also, it is necessary for the receiving terminal to receive voice packets completely to maintain high voice quality. Therefore, voice packet delay may cause the loss or lack of a voice packet to be received, resulting in the deterioration of voice quality. Thus, in general, receive-packet buffers of receiving terminals are adapted to absorb voice packet transmission delay fluctuation.
FIG. 12 is an illustrative view of voice packet transmission delay fluctuation. A receive-packet buffer 50 shown in FIG. 12 receives voice packets that arrive on an irregular interval and accumulates each voice packet at the point, and then outputs the accumulated voice packets to a voice reproducing device 51 regularly or on a constant interval. Then, the voice reproducing device 51 reads out voice packets regularly to reproduce the voice. Thus, voice packets that arrive at the receiving terminal irregularly are accumulated into the receive-packet buffer 50 temporarily, which allows transmission delay fluctuation to be absorbed to compensate for the impact due to transmission delay.
However, an excessive increase in the capacity of the receive-packet buffer 50 for absorbing transmission delay causes an increase in the accumulation time in the receive-packet buffer 50 and thereby call delay, resulting in a trouble in the interactive conversation. Therefore, the capacity of the receive-packet buffer 50 has a limitation. Also, voice packets having a transmission delay more than the capacity of the receive-packet buffer 50 are too late for readout. Consequently, the voice reproducing device 51 reproduces voice with some voice packets being lost, resulting in the deterioration of voice quality.
In addition to transmission delay fluctuation, there may also occur a voice packet loss in an IP network.
Both FIGS. 13(a) and 13(b) are illustrative views of a voice packet loss concealment method. Partial loss in a voice packet stream as shown in FIG. 13(a) causes the deterioration of voice quality such as voice interruption. Thus, the receiving terminal inserts a voice packet for concealing the loss (hereinafter referred to as “loss concealment packet”) into a lost segment (packet loss segment) shown in FIG. 13(b), and thereby suppresses the deterioration of voice quality. That is, the receiving terminal replaces a packet that is meant to be received originally with a loss concealment packet, and the voice reproducing device 51 reproduces the voice using the loss concealment packet.
As one of the voice packet loss concealment method above, ITU-T (International Telecommunication Union) recommendation G.711 Appendix I (hereinafter referred to as “publicly known document 1”) has been known. In the packet loss concealment method described in the publicly known document 1 is used pitch cycle, i.e., one of the physical characteristics of voice.
Pitch cycle represents the vibration cycle of vocal cords, which corresponds to the interval between “peaks” (or “troughs”) of the repeating unit of a repeating waveform as shown, for example, in FIG. 10(a). As known well, voiced sounds such as vowels are produced by the vibration of vocal cords, and the vibration of vocal cords has a constant cycle. For this reason, the waveform of voiced sounds appears at a substantially constant cycle repeatedly. Further, the vibration cycle of vocal cords fluctuates slightly, which is observed as a pitch cycle fluctuation (pitch fluctuation).
The voice packet loss concealment method described in the publicly known document 1 above comprises the following steps (X1) to (X3):
(X1) extracting a pitch pattern using a voice signal obtained from a normal voice packet that is received immediately before a lost voice packet, and then replacing the portion corresponding to the lost voice packet with the extracted pitch pattern repeatedly to generate a loss concealment signal;
(X2) at the connection with a normal voice immediately after the lost voice packet, applying a weighting addition to the loss concealment signal and the normal signal; and
(X3) in the case of a consecutive voice packet loss, attenuating the loss concealment signal gradually.
However, the method as described in (X1), in which the extracted pitch pattern is simply repeated at a pitch cycle, cannot follow the pitch cycle fluctuation that is originally contained in the voice, causing a phase shift. Also in the case as described in (X2) where the weighting addition is applied at the connection with the normal voice, there may occur an abnormal noise due to pitch mismatch at the connection with the normal voice. Further, in the case as described in (X3) where voice packet loss occurs consecutively, there occurs a feeling of mute auditorily due to gradual attenuation of the loss concealment signal.
An example of a method for solving the phase shift, abnormal noise and feeling of mute above is disclosed in Japanese Patent Laid-Open No. 2001-228896 (hereinafter referred to as “publicly known document 2”). An alternative replacement method for lost voice packet disclosed in the publicly known document 2 comprises the following steps (Y1) to (Y3)
(Y1) estimating the pitch cycle fluctuation and the signal power fluctuation using voice data received normally immediately before a loss;
(Y2) Generating a loss concealment signal with the pitch cycle fluctuation and the power fluctuation estimated based on a pitch pattern that is extracted from the signal immediately before the lost voice packet being added thereto; and
(Y3) at the connection with a normal voice immediately after the lost voice packet, applying a weighting addition to the loss concealment signal and the normal signal.
The alternative replacement method for lost voice packet described in the publicly known document 2 generates the loss concealment signal in consideration of the pitch fluctuation and the signal power fluctuation and thereby allows the problems of the publicly known document 1 to be solved, the method also being adapted to estimate the pitch fluctuation and the signal power fluctuation of the lost segment using a preceding voice packet before the lost voice packet.
FIG. 14 is an illustrative view of a voice packet estimation method using preceding voice packets. Among segments (packet segments) 1 to 5 shown in FIG. 14, a receiving terminal receives voice packets normally in the segments 1 to 4. Here, in the case of a voice packet loss in the segment 5, the receiving terminal generates a loss concealment packet using the voice packets in the segments 1 to 4. To be concrete, the receiving terminal estimates the pitch fluctuation and the signal power fluctuation of a loss concealment packet to be inserted into the segment 5 to generate the loss concealment packet, and then inserts the generated loss concealment packet into the segment 5. Therefore, this estimation method utilizes the assumption (presumption) that the last (preceding) fluctuation distribution will be continued as it is.
The estimation method above allows a certain level of accurate estimation when the voice fluctuates gradually in the voice packet lost segments 1 to 5. However, when there are many lost segments or a long segment, or the voice changes drastically in a lost segment corresponding to the start point or end point of a voice, it is impossible to make an accurate voice packet estimation, and the problems of the publicly known document 1 cannot be improved sufficiently.
Meanwhile, a method for interpolating a lost segment with a voice packet is disclosed in Japanese Patent Laid-Open No. 2000-59391 (hereinafter referred to as “publicly known document 3”). In the method as described in the publicly known document 3, when two or more temporary packets from a transmitting terminal are extracted in a predetermined combination with being separated by a predetermined interval and received by a voice packet receiving terminal as voice packets via an asynchronous digital communication network, each temporary packet contained in a lost packet can easily be reproduced by interpolating the previous and next temporary packets stored temporarily.
However, the interpolation method as described in the publicly known document 3 requires both transmitting and receiving terminals, resulting in an increase in cost.
Accordingly, the prior arts fail to improve the voice quality even in the case of a drastic voice change, and that with a relatively low cost.
The present invention has been made in view of the above-described problems and is intended to provide a voice packet loss concealment device, a voice packet loss concealment method, a receiving terminal, and a voice communication system, which allows abnormal noise due to pitch cycle mismatch and feeling of mute due to signal power attenuation to be suppressed, subjective deterioration of naturalness and continuity due to voice packet loss to be improved, and the voice packet loss concealment performance in, for example, the rise part of a voice to be further improved.