High efficiency coding of audio signal allows for the realization of a sound in high quality, for example, even when data amount is reduced to approximately 1/10 to 1/20 that of a CD (Compact Disk), by using the mechanism of human hearing (hearing characteristics and the like). Currently, products using such technologies are distributed in the marketplace, thereby allowing for such as recording on a smaller recording medium and delivering via a network.
One of the major hearing characteristics used in such high efficiency coding of audio signal is simultaneous and temporal masking.
Simultaneous masking is a hearing characteristic that, in a case where sounds at different frequencies exist at the same time, when there is a small-amplitude sound in the neighborhood of the frequency of a large-amplitude sound, the small-amplitude sound is masked and becomes hard to perceive.
On the other hand, temporal masking is a masking effect in the temporal direction, and a hearing characteristic that, for example, a small-amplitude sound existing at a time before or after a large-amplitude sound is masked to be hard to perceive.
There are two phenomena for the temporal masking: forward masking where temporally-before generated sounds mask temporally-after generated sounds; and backward masking where a temporally-after generated sounds mask temporally-before generated sounds.
It is known that forward masking is effective for a period in the order of several tens of msec (milliseconds) while backward masking is effective for an extremely short period of approximately 1 msec.
In a typical high efficiency coding method for audio, after orthogonal transformation of time signals by MDCT (Modified Discrete Cosine Transform), normalization is performed on the obtained MDCT coefficients on the frequency axis for each set of a plurality of MDCT coefficients, and then, quantization and coding are performed. Here, for advantageous use of the above-described hearing characteristics, signals are efficiently compressed by adaptively changing the number of steps in quantization for each set of MDCT coefficients and controlling the generation of quantization noise.
The transformation length of the above MDCT is set to be approximately 20 to 40 msec considering such as the time for simultaneous masking to work effectively. However, in a case of non-stationary signals which have sharp attacks, such as made by a pair of castanets, the generated quantization noise (quantization errors) are uniformly distributed in a frame after inverse MDCT transformation. FIGS. 9 and 10 show such situation.
In MDCT for a practical coding apparatus, adjacent frames are used, partially overlapped to each other. However, for the sake of simplicity, it is assumed that there is no overlap between the frames, and the explanation will be made in a more general manner.
A signal on the time axis shown in FIG. 10 is obtained by coding by the above-described method on an input signal on the time axis as shown in FIG. 9, i.e., an input signal with pulse-like attacks, and decoding the obtained code string. As is apparent from FIG. 10, the quantization noise (quantization error) shown in the shaded area in the figure is uniformly distributed along the time axis in the frame. Besides, “Frame” in FIGS. 9 and 10 indicate a frame of a MDCT transformation length, and is likewise in the following FIGS. 11-13.
Regarding the so generated noise, generally, noise generated temporally before the attacks is called pre-echo noise; on the other hand, noise generated temporally after the attacks is called post-echo noise.
The period during which the above-described temporal masking auditory masks the pre-echo noise and post-echo noise is extremely short, and thus, such cannot be prevented by the above-described MDCT transformation length of 20 to 40 msec. A general high efficiency coding method for audio tries various measures to limit such noise to a period during which temporal masking is effective.
For example, MPEG1 Audio Layer III (MPEG: Moving Picture Experts Group), a so-called MP3, takes measures to suppress pre/post-echo noise by generating a subband signal by decomposing an input PCM signal into equal subbands with a subband filter bank, and then, appropriately selecting from MDCT transformation lengths of two different lengths depending on the stationarity of the signal. For example, when a non-stationary signal having a sharp attack is input, by selecting a short MDCT transformation length, the period during which pre/post-echo noise is generated is limited to be within a short MDCT frame and noise is prevented from being perceived.
On the other hand, as disclosed in Patent Document 1, a coding method used in a so-called MD (MiniDisc™) and the like takes measures to suppress pre/post-echo noise by generating a subband signal by decomposing an input PCM signal into equal subbands with a subband filter bank, then, performing gain control of changing, along the time axis, the gain of the subband signal depending on the stationarity of the signal to make it a stationary subband signal, and then, performing MDCT of a fixed transformation length and performing normalization/quantization/coding.
FIGS. 11, 12 and 13 are diagrams for explaining the effect of gain control. Incidentally, in the above-described Patent Document 1, an explanation is made with adjacent frames partially overlapped. However, herein, an explanation is made in a more general manner where there is no overlap between the frames.
First, a coding apparatus obtains gain control functions, such as G_0(t), G_1(t), G_2(t) of FIG. 11, for the input signal of FIG. 9.
The gain control function corresponds to that obtained by further equally subdividing each frame along the time axis into subframes, obtaining the maximum amplitude or power within these subframes, and interpolating the obtained maximum amplitude or power with a linear function or the like. The input signal is first made into an approximately flat signal in the temporal direction by multiplying the input signal by the gain control function, amplifying the small amplitude portion and attenuating the large amplitude portion; and the signal is then normalized, quantized and coded, and multiplexed along with gain function generation information and normalization information, thereby a code string is obtained.
A decoding apparatus sequentially performs demultiplexing (process inverse to multiplexing), decoding, dequantization, and denormalization (process inverse to normalization) on the input code string. A signal on the time axis obtained by the processes so far is as shown in FIG. 12, and the quantization error shown in the shaded area in FIG. 12 is uniformly distributed in the entire frame. Here, a waveform as shown in FIG. 13 is obtained by reconstructing an inverse gain control function (function with the value of the gain control function inversed) that is a counterpart of the gain control function of FIG. 11 from the gain control function generation information obtained by demultiplexing the above-described code string and multiplying the waveform of FIG. 12 by the inverse gain control function.
As is apparent from FIG. 13, the quantization error shown in the shaded area in the figure is distributed in such a way that, compared to the level in the vicinity of before and after a pulse which is an attack portion, the level is attenuated in the other portions, and due to the effect of temporal masking, pre-echo noise and post-echo noise can be considerably suppressed. The gain control itself is within optimization of coding algorithm of the coding apparatus, and various settings are possible according to the circuit size and application of the coding apparatus, such as, if bits are allocated abundantly and the generation of quantization error is extremely small, gain control is not performed on an input signal with a large attack, for example.
As such, with a high efficiency coding of audio, by appropriately controlling the generation of quantization noise according to the property of a signal by making better use of hearing characteristics, it becomes possible to efficiently compress a signal.
Meanwhile, in such high efficiency coding technology for audio signal, since relatively large computation amount and memory are needed at the time of coding/decoding, a technology of first performing a simple signal processing on a coded code string is being proposed. This is a technology that performs a desired signal processing with small amount of computation and small memory by directly changing the parameter or the like included in a code string without performing a process of performing a desired signal processing on a code string after decoding the code string to a signal on the time axis and then re-coding the signal.
For example, Patent Document 2 discloses a technology making the filtering of a signal possible by directly changing normalization coefficient information in a code string. Also, Patent Document 3 discloses a technology making level adjustment of a signal possible by directly changing normalization coefficient information in a code string.    [Patent Document 1] Japan Patent No. 3263881    [Patent Document 2] Japan Patent No. 3879249    [Patent Document 3] Japan Patent No. 3879250