The present disclosure relates to a coding apparatus, a coding method, a decoding apparatus, a decoding method, and a program and, more particularly, relates to a coding apparatus, a coding method, a decoding apparatus, a decoding method, and a program that are capable of reducing the bit rate of data for interpolation.
Examples of methods for coding an audio signal, in general, include transform coding methods, such as moving picture experts group audio layer-3 (MP3), advanced audio coding (AAC), and adaptive transform acoustic coding (ATRAC).
FIG. 1 is a block diagram illustrating an example of the configuration of a coding apparatus that codes an audio signal.
A coding apparatus 10 of FIG. 1 is constituted by a modified discrete cosine transform (MDCT) unit 11, a normalization unit 12, a quantization unit 13, a coding unit 14, and a multiplexing unit 15.
A pulse code modulation (PCM) signal T of audio of a predetermined channel is input as a PCM signal T[J] to the MDCT unit 11 of the coding apparatus 10 for each fixed section called a frame. J represents the index of a frame.
The MDCT unit 11 performs windowing of a window function W[J] on a PCM signal T[J] which is a time domain signal, performs MDCT on the PCM signal [J] that is obtained thereby, and obtains a spectrum S[J] that is a frequency domain signal. The MDCT unit 11 supplies the spectrum S[J] to the normalization unit 12.
The normalization unit 12 extracts an envelope F[J] from the spectrum S[J], and supplies it to the multiplexing unit 15. Furthermore, the normalization unit 12 normalizes the spectrum S[J] by using the envelope F[J], and supplies a normalized spectrum N[J] obtained thereby to the quantization unit 13.
The quantization unit 13 quantizes the normalized spectrum N[J] that is supplied from the normalization unit 12 on the basis of quantization accuracy information P[J] determined by a predetermined algorithm, and supplies a quantized spectrum Q[J] obtained thereby to the coding unit 14. Furthermore, the quantization unit 13 supplies the quantization accuracy information P[J] to the multiplexing unit 15. As a predetermined algorithm for determining the quantization accuracy information P[J], for example, algorithms that are already widely available can be used.
The coding unit 14 codes the quantized spectrum Q[J] supplied from the quantization unit 13, and supplies a code spectrum H[J] obtained thereby to the multiplexing unit 15.
The multiplexing unit 15 multiplexes the envelope F[J] supplied from the normalization unit 12, the quantization accuracy information P[J] supplied from the quantization unit 13, and the code spectrum H[J] supplied from the coding unit 14, and generates a bit stream B[J]. The multiplexing unit 15 outputs the bit stream B[J] as a coded result.
FIG. 2 is a block diagram illustrating a decoding apparatus that decodes the coded result by the coding apparatus 10 of FIG. 1.
A decoding apparatus 20 of FIG. 2 is constituted by a decomposition unit 21, a decoding unit 22, a dequantization unit 23, an inverse normalization unit 24, and an inverse MDCT unit 25.
The bit stream B[J], which is the coded result by the coding apparatus 10 of FIG. 1, is input to the decomposition unit 21 of the decoding apparatus 20.
The decomposition unit 21 decomposes the bit stream B[J] into an envelope F[J] and the quantization accuracy information P[J]. Furthermore, the decomposition unit 21 decomposes the bit stream B[J] into a code spectrum H[J] on the basis of the quantization accuracy information P[J]. The decomposition unit 21 supplies the envelope F[J] to the inverse normalization unit 24 and supplies the quantization accuracy information P[J] to the dequantization unit 23. Furthermore, the decomposition unit 21 supplies the code spectrum H[J] to the decoding unit 22.
The decoding unit 22 decodes the code spectrum H[J] supplied from the decomposition unit 21, and supplies the quantized spectrum Q[J] obtained thereby to the dequantization unit 23.
The dequantization unit 23 dequantizes the quantized spectrum Q[J] supplied from the decoding unit 22 on the basis of the quantization accuracy information P[J] supplied from the decomposition unit 21, and supplies the normalized spectrum N[J] obtained thereby to the inverse normalization unit 24.
The inverse normalization unit 24 inversely normalizes the normalized spectrum N[J] supplied from the dequantization unit 23 by using the envelope F[J] supplied from the decomposition unit 21, and supplies the spectrum S[J] obtained thereby to the inverse MDCT unit 25.
The inverse MDCT unit 25 performs inverse MDCT on the spectrum S[J], which is a frequency domain signal supplied from the inverse normalization unit 24, adds up the time domain signal obtained thereby on the basis of the window function W[J], and obtains an audio PCM signal T′[J]. The inverse MDCT unit 25 outputs the PCM signal T′[J] as an audio signal.
As described above, the coding apparatus 10 codes the bit stream B[J] for each frame and outputs it, and the decoding apparatus 20 decodes the bit stream B[J] for each frame. As described above, in the coding apparatus 10 and the decoding apparatus 20, the processing unit is a frame.
FIG. 3 illustrates the PCM signal T[J] and the bit stream B[J].
As shown in part A of FIG. 3, the PCM signal T is a time domain signal. In part A of FIG. 3, the horizontal axis represents time t, and the vertical axis represents the level of a PCM signal.
The coding apparatus 10 performs windowing of a window function W[J] on the PCM signal T[J], which is divided for each frame. As shown in part B of FIG. 3, the window function W[J] is set in such a manner that the first half section thereof overlaps the second half section of the window function W[J−1] of the previous frame, and the second half section of the window function W[J] overlaps the first half section of the window function W[J+1] of the subsequent frame. In an example of FIG. 3, the section of the window function W[J−1] is a section up to time t0 (t0<t1), and the section of the window function W[J] is a section from time t1 to time t3 (t3>t2). The section of the window function W[J+1] is a section from time t2 to time t4 (t4>t3).
The coding apparatus 10 performs MDCT transform, coding, and the like on the PCM signals T[J−1] to T[J+1] obtained by windowing using the window functions W[J−1] to W[J+1], and outputs bit streams B[J−1] to B[J+1] shown in part B of FIG. 3 as coded results.
The decoding apparatus 20 performs decoding, inverse MDCT transform, and the like on the bit streams B[J−1] to B[J+1], and obtains time domain signals of the sections of the window functions W[J−1] to W[J+1]. Then, the decoding apparatus 20 adds the second half section (the section from time t1 to time t2 in the example of FIG. 3) of the time domain signal of the section of the window function W[J−1] and the first half section (the section from time t1 to time t2 in the example of FIG. 3) of the time domain signal of the section of the window function W[J], and obtains a PCM signal T′[J]. Furthermore, the decoding apparatus 20 adds the second half section (the section from time t2 to time t3 in the example of FIG. 3) of the time domain signal of the section of the window function W[J] and the first half section (the section from time t2 to time t3 in the example of FIG. 3) of the time domain signal of the section of the window function W[J+1], and obtains a PCM signal T′[J+1].
Since the coding apparatus 10 performs MDCT, the overlapping sections before and after the window function W[J] in FIG. 3 are each 50% of all the sections. However, when the coding apparatus 10 performs a discrete fourier transform (DFT) rather than MDCT, the overlapping section is not necessary to be 50% of all the sections. Furthermore, windowing may be performed in only one of the coding apparatus 10 and the decoding apparatus 20.
If a bit stream of a certain frame is lost in the procedures of coding and decoding, the PCM signal of the frame is lost, and audible noise may be generated. A description will be given, with reference to FIG. 4, of this case. Part A of FIG. 4 is similar to part A of FIG. 3, and accordingly, the description is omitted.
As shown in part B of FIG. 4, in the decoding apparatus 20, when the bit stream B[J] is lost, the time domain signal of the section of the window function W[J] that should be obtained as a result of coding, an inverse MDCT transform, or the like being performed on the bit stream B[J] is not obtained.
As a result, it is not possible to obtain the PCM signal T′[J] that is generated by using the time domain signal of the first half section of the window function W[J] and the PCM signal T′[J+1] that is generated by using the time domain signal of the second half section of the window function W[J].
Therefore, for example, as shown in part B of FIG. 4, it is considered that the PCM signal T′[J] and the PCM signal T′[J+1] are interpolated using a signal of zero. However, in this case, since the PCM signal becomes noncontinuous in the section from time t1 to time t3, if audio corresponding to the PCM signal in this section is output, a sputtering sound is heard.
Accordingly, a method of interpolating the PCM signal T′[J] of the frame, which is not obtained due to a loss, by using a time domain signal that is not lost, which was scheduled to be used to generate the PCM signal T[J], rather than a signal of zero, is considered. This method will be described with reference to FIG. 5. part A of FIG. 5 is similar to part A of FIG. 3, and accordingly, the description thereof is omitted.
According to the above-mentioned method, as shown in part B of FIG. 5, in the decoding apparatus 20, in a case where the bit stream B[J] is lost, the PCM signal T′[J] is interpolated by the time domain signal of the second half section of the window function W[J−1] that is not lost, which was scheduled to be used to generate the PCM signal T′[J]. Furthermore, the PCM signal T′[J+1] is interpolated using the time domain signal of the first half section of the window function W[J+1] that is not lost, which was scheduled to be used to generate the PCM signal T′[J+1].
According to this method, noncontinuousness of the PCM signal does not occur in the section from time t1 to time t3. However, there is a case in which the time domain signal of the second half section of the window function W[J−1], and the time domain signal of the first half section of the window function W[J+1], which are used for interpolation, markedly differ from the original PCM signal T′[J] and PCM signal T′[J+1]. In this case, when audio corresponding to the PCM signal of the section from time t1 to time t3 is output, also, there is a case in which a sputtering sound is heard.
Accordingly, in order to suppress this noise, a method in which, in a case where the bit stream of a predetermined frame is lost on the decoding side, the coding side resends the bit stream of the frame, has been devised (see, for example, Japanese Patent No. 3994388). However, in this method, there is a case in which the bit stream that is resent does not arrive on time.
Furthermore, a method in which, in a case where the coding side transmits the bit stream of each frame by a plurality of methods and the bit stream of the frame that is transmitted by a predetermined method on the decoding side is lost, the bit stream of the frame, the bit stream being transmitted by another method, is substituted for, has been devised (see, for example, Japanese Patent Application No. 4016709).
FIG. 6 is a block diagram illustrating an example of the configuration of a coding apparatus using this method.
Components shown in FIG. 6, which are identical to the components of FIG. 1, are designated with the same reference numerals. Duplicated descriptions are omitted as appropriate.
The configuration of the coding apparatus 30 of FIG. 6 differs from the configuration of FIG. 1 in that, mainly, a normalization unit 31, a quantization unit 32, a coding unit 33, and a multiplexing unit 34 are newly provided.
The normalization unit 31, the quantization unit 32, the coding unit 33, and the multiplexing unit 34 generate a bit stream C[J] from a spectrum S[J] in the same manner as for the normalization unit 12, the quantization unit 13, the coding unit 14, and the multiplexing unit 15, respectively.
However, since the bit stream C[J] is a preliminary bit stream that is substituted for in a case where the bit stream B[J] is lost, as shown in FIG. 7, the bit rate of the bit stream C[J] is coded in accordance with a coding method different from that of the bit stream B[J] so that the bit rate is decreased to smaller than the bit rate of the bit stream B[J]. Therefore, the sound quality of the audio corresponding to the decoded result of the bit stream C[J] is not good compared to the audio corresponding to the decoded result of the bit stream B[J].
In the coding apparatus 30, the bit stream C[J] that is generated in the manner described above, and the bit stream B[J] that is generated in the same manner as for the coding apparatus 10 are transmitted through different transmission paths.
FIG. 8 is a block diagram illustrating an example of the configuration of a decoding apparatus that decodes a coded result by the coding apparatus 30 of FIG. 6.
A decomposition unit 51, a decoding unit 52, a dequantization unit 53, and an inverse normalization unit 54 of a decoding apparatus 50 of FIG. 8 are basically configured similarly to the decomposition unit 21, the decoding unit 22, the dequantization unit 23, and the inverse normalization unit 24 of FIG. 2, respectively, and differ in that the loss of a bit stream B[J] is detected. The loss of the bit stream B[J] is detected in a case where the bit stream B[J] is lost for some problem in a transmission path or an error occurs in the received bit stream B1[J], and a loss detection result E[J] is supplied from each unit to a switch 59. Furthermore, the spectrum S[J] that is generated from the bit stream B[J] by the decomposition unit 51, the decoding unit 52, the dequantization unit 53, and the inverse normalization unit 54 is supplied to the switch 59.
The decomposition unit 55, the decoding unit 56, the dequantization unit 57, and the inverse normalization unit 58 of the decoding apparatus 50 are the same as the decomposition unit 21, the decoding unit 22, the dequantization unit 23, and the inverse normalization unit 54 of FIG. 2, respectively, except that the target to be processed is a bit stream C[J] and the decoding method is different. The decomposition unit 55, the decoding unit 56, the dequantization unit 57, and the inverse normalization unit 58 decode the bit stream C[J] so as to generate a spectrum S1[J], and supplies it to the switch 59.
In a case where the bit stream B[J] is lost on the basis of the detection result E[J], the switch 59 selects the spectrum S1[J] supplied from the inverse normalization unit 58, and supplies it to the inverse MDCT unit 60. On the other hand, in a case where the bit stream B[J] is not lost on the basis of the detection result E[J], the switch 59 selects the spectrum S[J] supplied from the inverse normalization unit 54, and supplies it to the inverse MDCT unit 60.
The inverse MDCT unit 60 performs inverse MDCT on the spectrum S1[J] or the spectrum S[J], which is a frequency domain signal supplied from the switch 59. Then, the inverse MDCT unit 60 adds up the time domain signal obtained thereby on the basis of the window function W[J], and obtains an audio PCM signal T′1[J]. The inverse MDCT unit 60 outputs the PCM signal T′1[J] as an audio signal.
A description will be given, with reference to FIG. 9, of a case in which the bit stream B[J] is lost in the decoding apparatus 50 configured as described above.
As shown in FIG. 9, in a case where the bit stream B[J] is lost, the spectrum S[J] to be generated from the bit stream B[J] is interpolated using the spectrum S1[J] that is generated from the bit stream C[J]. As a result, it is possible to obtain time domain signals of all the sections of the window function W[J], and it is possible to obtain the PCM signal T′1[J] and the PCM signal T′1[J+1] by using the time domain signal.
The sound quality of the audio corresponding to the bit stream C[J] is not good compared to the bit stream B[J], but it may be that the sound quality is much better than that of the audio whose sound quality is deteriorated due to the loss of the bit stream B[J].