The present invention relates to a digital audio signal processing method, a digital audio signal processing apparatus and a recording/playback apparatus. The present invention is well applicable to, for example, a digital video tape recorder for coding an audio signal into block units prior to the recording thereof.
The conventional techniques for reducing the amount of data of an audio signal by coding the audio signal into block units include the sub-band coding and the conversion coding. By these techniques, redundant components are effectively reduced or eliminated by using the maldistribution property of the audio signal in the frequency-axis direction. These techniques are adopted, for example, in the audio standards of the MPEG (Moving Picture Expert Group).
A general configuration of an encoder 1 and a decoder 2 for implementing an M-division sub-band encoding technique is shown in FIG. 5. As is shown in the figure, in the encoder 1, input audio data D1 is supplied to an analysis filter 3 having M band-pass filters (BPFs) 4A.sub.1 to 4A.sub.M and down-sampling circuits 5A.sub.1 to 5A.sub.M. The input audio data D1 is divided into M frequency bands by the band-pass filters (BPFs) 4A.sub.1 to 4A.sub.m. Frequency-band signals output by the band-pass filters 4A.sub.1 to 4A.sub.M are then down-sampled to 1/M by the down-sampling circuits 5A.sub.1 to 5A.sub.m. The frequency-band signals completing the down-sampling process are quantized by quantizers (Qs) 6A.sub.1 to 6A.sub.M which are each provided for one of the frequency bands. Subsequently, signals output by the quantizers (Qs) 6A.sub.1 to 6A.sub.M undergo packet conversion processing at packeting circuits 7A.sub.1 to 7Am, being converted into sub-band coded data D2.
In the decoder 2, the sub-band coded data D2 is sequentially supplied to unpacketing circuits 8A.sub.1 to 8A.sub.M and then to inverse-quantization circuits 9A.sub.1 to 9A.sub.M in order to form the down-sampled frequency-band signals which are then fed to a synchronization filter 10. The synchronization filter 10 having up-sampling circuits 11A.sub.1 to 11A.sub.M and band-pass filters (BPFs) 12A.sub.1 to 12A.sub.M restores the frequency-band signals to the signal before the down sampling by means of the up-sampling circuits 11A.sub.1 to 11Am. Frequency-band signals output by the up-sampling circuits 11A.sub.1 to 11A.sub.M are then synthesized into a single band by means of the band-pass filters (BPFs) 12A.sub.1 to 12A.sub.M to create restored audio data D3.
The encoder 1 detects a maximum absolute value, which is referred to hereafter as a scale factor (SF), for each of the frequency-band signals when the frequency-band signals are quantized by the quantizers (Qs) 6A.sub.1 to 6A.sub.M after completing the down-sampling process as shown in FIG. 6. Data of each sub-band is normalized by using the maximum absolute value for the sub-band before being quantized. At that time, a quantization level is allocated to each of the quantizers (Qs) 6A.sub.1 to 6A.sub.M in accordance with the amount of data in the sub-band associated with the quantizer so as to produce a fixed amount of data as a whole. The allocation of quantization levels is referred to hereafter as bit allocation.
In the actual bit allocation, a quantizing-bit count is determined for each of the sub-bands from the magnitude of the scale factor for the sub band. In addition to the implementation of the bit allocation based on the magnitude of the scale factor, the bit allocation can also be determined by using a psychological hearing model which utilizes properties of the sense of hearing possessed by a human being. A compression factor used in the coding process is determined by the total number of bits allocated in the quantization.
In the encoder 1, the packeting circuits 7A.sub.1 to 7A.sub.M add a header denoted by HEADER, bit-allocation information (that is, information on the quantization level) denoted by ALLOC and scale-factor information denoted by SF one after another to sub-band data starting at the head of the sub-band data to create sub-band coded data D2. In the decoder 2, on the other hand, the sub-band coded data D2 is rearranged from data in a bit-stream state into trains of data for the sub-bands, before being restored into the original audio data by using the bit-allocation information and the scale-factor information.
A typical configuration of the bit stream of the sub-band coded data D2 is depicted in more detail in FIG. 7. The bit-allocation information (ALLOC) and the scale-factor information (SF) are arranged one after another into an array in each sub-band block, starting from a band in the low-frequency region in an order increasing frequencies to cover 32 bands. In the data area (DATA), a data block constituting a sub-band from 12 samples is created also to cover 32 bands starting with a low-frequency band in an order of increasing frequencies. In addition, a header (HEADER) describing the state of encoding is added to the beginning of each block. The header also includes information on a compression factor used in the coding process.
As such, a coded block is created from 384 audio samples. Thus, even though the number of bytes for the header (HEADER), the bit-allocation information (ALLOC) and the scale-factor information (SF) is fixed, the number of bytes in the data area (DATA) varies, depending upon the compression factor.
In this manner, sub-band coded data D2 is created with a predetermined number of samples treated as a unit coded block. In the case described above, 384 samples of input audio data are used as a unit coded block.
By the way, it is desirable to transmit or record coded audio data, which has been coded into block units as is described above, in synchronization with frames or fields of a video signal. In actuality, however, according to the MPEG audio standards, audio coded blocks are not in a synchronized relation with either frames or fields of a video signal. For this reason, if an attempt is made to edit coded audio data in frame or field units which are each a smallest unit of a picture, the continuity of the bit stream of audio coded blocks at boundary positions between any two consecutive frames or fields can not be retained, making it impossible to decode the data.
According to the MPEG audio standards, for example, the number of samples of an audio signal associated with a frame of an NTSC picture is prescribed to be 1,601 or 1,602. With the sub-band coding process wherein 384 samples are treated as a unit coded block as described above, on the other hand, the number of coded blocks in a frame is not an integer. That is to say, the frame and the coded blocks are not in a synchronized relation. For this reason, there is a coded block that crosses a boundary between two consecutive frames. As a result, discontinuity occurs within such a coded block when coded blocks are edited in frame units.
This problem is explained by taking a digital VTR as an example. An audio-data recording/playback unit 20 of the digital VTR shown in FIG. 8 has a configuration that allows input audio data D10 to be inserted for recording into a desired position in audio data already recorded in advance. In the audio-data recording/playback unit 20, the input audio data D10 is supplied to a cross-fade processing circuit 21 to undergo cross-fade processing therein. As a result of the cross-fade processing, edited data D11 is created and then supplied to an encoder 22.
As is shown in the figure, the cross-fade processing circuit 21 comprises an amplifier 21A for amplifying the input audio data D10, an amplifier 21B for amplifying decoded data D16 and an adder 21C for adding a signal output by the amplifier 21A to a signal output by the amplifier 21B. The cross-fade processing is carried out by gradually reducing the amplification degree of the amplifier 21B while gradually increasing the amplification degree of the amplifier 21A when recording the input audio data D10 into a desired position of the audio data recorded in advance. That is to say, the cross-fade processing is performed in order to remove the sense of incompatibility caused by an abrupt variation in audio signal which occurs on the boundary between the original data (the decoded data D16) and the insert data (the input audio data D10).
The encoder 22 creates coded edited data D12 by carrying out the sub-band coding processing described above on the edited data D11. The coded edited data D12 is then supplied to an error-code adding circuit 23. The error-code adding circuit 23 creates recording data D13 by adding a predetermined error code to the coded edited D12. The recording data D13 is supplied to a recording head mounted on a rotating drum 24 to be recorded into a magnetic tape 25 along with recording video data produced by a video-data coding circuit. It should be noted that the recording head and the video-data coding circuit are not shown in the figure.
At that time, in the audio-data recording/playback unit 20, a pre-read head preceding the recording head reads out audio data recorded in the magnetic tape 25 in advance, reproducing playback data D14. It should be noted that the pre-read head is also not shown in the figure. The playback data D14 is supplied to an error correcting circuit 26. The error correcting circuit 26 restores playback coded data D15 by using error codes added to playback data D14. The playback coded data D15 is then supplied to a decoder 27. The decoder 27 outputs the decoded data D16 by carrying out processing opposite to that performed by the encoder 22. The decoded data D16 is then supplied to the-cross-fade processing circuit 21 as well as to external equipment such as a speaker.
The edit processing described above is carried out in frame or field units which are synchronous with the video signal. Since coded blocks resulting from the sub-band processing are not synchronous with frames or fields, however, audio coded blocks which can not be decoded are generated.
This problem is explained in concrete terms by referring to FIG. 9 as follows. When coded blocks associated with a frame are played back, a coded block crossing the boundary between two consecutive frames can not be decoded. This is because, when an attempt is made to edit various pieces of information added to the head of each coded block for decoding purposes in frame units, the pieces of information can not be extracted from a coded block that crosses the boundary between two consecutive frames. The pieces of information are the header information, the bit-allocation information and the scale-factor information described earlier.
Even if playback audio data prior to the edit processing can be completely decoded, in some cases, coded audio data recorded after the edit processing can not be decoded any more. In order to explain this problem, edit processing based on the assumption that playback audio data prior to the edit processing is decodable is explained as follows.
A typical editing process in which cross-fade processing is carried out is shown in FIG. 10. In the first place, IN and OUT points of the edit processing are set at arbitrary positions. In the case of the editing process shown in the figure, the IN and OUT points are each set at a position in close proximity to a boundary between two consecutive frames. First of all, let us pay attention to the IN point. In the case of the editing process shown in the figure, the head of a cross fade is located in frame 2 but an audio coded block including the head of the cross fade, which is to be recorded, crosses the boundary between frames 1 and 2. A smallest rewritable unit in the edit processing is a frame. As a result, when the rewriting is started at frame 2, the audio coded block crossing the boundary between frames 1 and 2 is divided into first and second halves which are included in trains of data (or bit streams) and entirely different from each other, making it impossible to carry out the next decoding.
The above explanation holds true of the OUT point. To be more specific, an audio decoded block that can not be decoded is inevitably resulted in on the boundary between two consecutive frames. Since the smallest recording unit of the edit processing is a frame, discontinuity of an audio coded block is inevitably resulted in on the boundary between two consecutive frames regardless of which frame the actual rewriting is to be started from, giving rise to a shortcoming that coded data recorded after the edit processing cannot be decoded any more.
The actual edit processing is explained in concrete terms by referring to FIGS. 11A to 11I. In FIGS. 11A to 11I, capital characters are used for denoting data which has not undergone a compression coding process (that is, linear PCM data) whereas small-case characters denote data completing a compression coding process (that is, sub-band coded data). A symbol having the apostrophe "'" as a suffix thereof such as s' or t' etc. is used for indicating data which has completed a second compression coding process. Data which has undergone a second compression coding process after completing a first one as such is referred to hereafter as second-generation coded data. In addition, a boundary between two consecutive pieces of data shown in FIGS. 11A to 11H is a boundary between two consecutive frames of a video signal. It should be noted that the encoding and decoding processes always delay the coded and decoded data respectively by a time equal to the length of one frame.
Pieces of data t, u, - - - (that is, the reproduced coded data D15 shown in FIG. 11C) output by the error correcting circuit 26 are decoded by the decoder 27 into pieces of data T, U, - - - (that is, the data D16 shown in FIG. 11D). The pieces of data T, U, - - - output by the decoder 27 are then converted by the cross-fade processing circuit 21 into pieces of edited data T, U, - - - (that is, the data D11 shown in FIG. 11E). Subsequently, the pieces of edited data T, U, - - - are coded by the encoder 22 into pieces of coded edited data t', u', - - - (that is, the playback coded data D12 shown in FIG. 11F). The pieces of coded edited data t', u', - - - are finally processed by the error-code adding circuit 23 into pieces of recording data t', u', - - - (that is, the data D13 shown in FIG. 11G).
Let an attempt be made to insert the piece of recording data u' and the subsequent pieces of recording data into a point after the original recording data t as is shown in FIG. 11H. In this attempt, the second-generation recording data u' is concatenated to the original recording data t as inserted data at a boundary location between two consecutive frames, resulting in discontinuity in a coded block BLK2 crossing this boundary as is shown in FIG. 11I. That is to say, the coded block BLK2 includes the first-generation coded data t and the second-generation recording data u' and, thus cannot be decoded.
The present invention addresses the problem described above. To be more specific, the present invention provides a digital audio signal processing method, a digital audio signal processing apparatus and a recording/playback apparatus whereby a discontinuity point of a coded block is prevented from being resulted in due to edit processing, allowing all pieces of coded data to be decoded in order to reproduce audio data even if the edit processing is carried out on coded blocks not synchronous with frames or fields, the smallest edit-processing unit.