Currently, voice communication using VoIP technology over a network such as the internet is in common use.
In communication over a network such as the internet, in which communication quality is not assured, because of a packet loss that a packet is lost during transmission, a phenomenon (audio loss) in which part of audio data that should be received in time series is lost, may occur comparatively frequently. When the audio data, in which an audio loss has occurred, is decoded and the decoding result with no changes is output, frequent interruptions of voice and the like occur, thereby impairing voice quality. As a method of compensating the impairment, for example, the art of Non-Patent Document 1 mentioned below has been already known. The encoding method in Non-Patent Document 1 is based on the premise of PCM (pulse modulation) encoding method described in Non-Patent Document 2 mentioned below.
In the art of Non-Patent Document 1, audio encoded data which is an audio signal coded through the use of the PCM encoding method in Non-Patent Document 2 is decoded to obtain a decoded audio signal, and the decoded audio signal (hereinafter, referred to as a “decoding result”) is stored in a functioning block (e.g., a memory, etc.) which can store the decoding result. On the other hand, an audio loss is monitored for each audio frame (a frame), which is a unit of the decoding processing, compensation processing is performed every time an audio loss occurs.
The operation of the compensation processing is shown in FIGS. 2A to 2E.
Referring to FIG. 2A, reference symbols F1 to F7 denote frames (i.e., decoded audio signals) which are to be received in time series. In FIG. 2A, the earliest received frame is F1, and other frames sequentially follows as F2, F3, . . . . In an example of FIG. 2A, however, since the three frames F4 to F6 are successively lost as a result of the packet loss, an audio loss is detected in three sections corresponding to these three frames F4 to F6.
FIG. 2B shows the decoding result stored in the memory as a waveform. Since each of T1, T2 and T3 corresponds to one fundamental period, the decoding result of three fundamental periods is stored in the memory. Further, although a length of the fundamental period T is less than one frame of the decoding result in the shown example, the length of the fundamental period T may be longer than one frame of the decoding result.
FIG. 2C shows compensation processing in a section corresponding to the frame F4, FIG. 2D shows compensation processing in a section corresponding to the frame F5, and FIG. 2E shows compensation processing in a section corresponding to the frame F6.
When an audio loss (the first audio loss) in the section corresponding to the frame F4 is detected, as shown in FIG. 2C, interpolation audio data for compensating the audio loss is generated in accordance with the decoding result of one fundamental period, i.e., the decoding result of a section Ta, which was stored in the memory immediately before the frame F4. The section Ta corresponds to the fundamental period T1.
In the one fundamental period, the oldest position B4 of the section Ta is regarded as a beginning position of the interpolation audio data, and the interpolation audio data is generated by obtaining one frame. As shown in the figure, however, if one fundamental period is less than a period of one frame, it is insufficient to obtain one fundamental period of the decoding result S41. Accordingly, returning to the oldest position B4, the decoding result S42 is obtained in order to supplement the insufficiency. Then, the decoding results S41 and S42 are joined to insert into the section corresponding to the frame F4, as interpolation audio data. Processing such as overlapadd or the like is performed in order to make the waveform uninterrupted at the joint of the decoding results S41 and S42.
Subsequent to the detection of the audio loss in the frame F4, if an audio loss is detected also in the section corresponding to the frame F5, in accordance with the decoding results of two fundamental periods of the section Tb, interpolation audio data for compensating audio losses is generated, as shown in FIG. 2D. The section Tb corresponds to the above-mentioned fundamental periods T1 and T2.
In two fundamental periods of the section Tb, a position B5, from which obtaining of interpolation audio data is started, is determined as follows. In general, a position E4 (the right end of S42), at which the decoding result S42 previously obtained in FIG. 2C terminates, is selected as the position B5. However, in a case where the position E4 is not included in the section T2 which is the oldest one fundamental period in the section Tb as shown in the figures, the position B5 is determined by shifting the position E4 by one fundamental period T toward the oldest side until the position E4 enters the section T2. In the shown example, the position B5 is set at the position determined by shifting the position E4 by one fundamental period toward the oldest side.
After the position B5 is thus determined, data S51 and S52 of one frame from the position B5 to the latest side (i.e., to a position E5) are obtained to generate interpolation audio data which is used to be inserted into the section corresponding to the frame F5. In the shown example, the data S52 whose right end is the position E5 is a pert of the section T1.
Subsequent to the detections of the audio losses in the frames F4 and F5, if an audio loss is detected in a section corresponding to the frame F6, in accordance with the decoding results of three fundamental periods of the section Tc, as shown in FIG. 2E, the interpolation audio data for compensating the audio losses is generated. The section Tc corresponds to a combination of the fundamental periods T1, T2 and T3. In FIG. 2E, in the similar manner to FIG. 2D, the position B6, from which obtaining of the interpolation audio data starts, is determined, data S61 and S62 of one frame from the position B6 are obtained to generate the interpolation audio data, which is used to be inserted into the section corresponding to the frame F6.
In the shown example, the position B6 (the left end of S61) corresponds to a position determined by shifting one fundamental period from the position E5 toward the oldest side.
Further, when the audio losses occur successively for a plurality of frames, interpolation audio data is gradually attenuated in or after the second frame (F5 and F6 in FIG. 2). For example, 20% linear attenuation per 10 ms can be adopted. Thereby, it is possible to suppress an abnormal sound such as a beep which may be caused when the same audio data is successively output.
Non-Patent Document 1: ITU-T Recommendation G.711 Appendix I
Non-Patent Document 2: ITU-T Recommendation G.711