A data processing apparatus, a data processing method, a program providing medium, and a recording medium
The present invention relates to a data processing apparatus and a data processing method for dealing with sound data, a program providing medium for providing a program for providing a program for dealing with sound data, and a recording medium in which sound data is recorded.
In recent years, owing to developments in high-efficiency encoding techniques, it is general to compress/encode sound data when keeping sound data. There is a necessity for a method of efficiently retrieving desired sound data among a number of encoded sound data pieces.
FIG. 1 shows a functional structure of a conventional sound data retrieving apparatus. Sound data (hereinafter called encoded sound data) which has been subjected to predetermined compression encoding processing, and a retrieving text database which describes attribute information associated with the encoded sound data (e.g., title, creator""s name, creation data, classification of the content, and the like) are previously recorded in the database 156 of this sound data retrieving apparatus.
The retrieving condition input section 151 receives an input of a retrieving condition/by a user. For example, attribute information and the signal characteristic or the like of a sample waveform are inputted as a retrieving condition. Further, the retrieving condition input section 151 supplies the attribute retrieving section 152 with attribute information (e.g., name of the creator and the like) inputted as a retrieving condition, and also supplies the comparative determination section 155 with the signal characteristic (e.g., the waveform amplitude and the like) inputted also as a retrieving condition.
The attribute retrieving section 152 retrieves an item which matches with the attribute information inputted through the retrieving condition input section 151, from the retrieving text database recorded in the database 156, and extracts encoded sound data corresponding to the item.
The candidate selection section 153 sequentially outputs the encoded sound data inputted from the attribute retrieving section 152 to the decoding section 154. The decoding section 154 decodes the encoded sound data inputted from the candidate selection section 153 and outputs the data to the comparative determination section 155.
The comparative determination section 155 obtains a level of similarity between the sound data inputted from the decoding section 154 and the signal characteristic of the sample waveform supplied from the retrieving condition input section. If the similarity is a predetermined threshold value or more, the section 155 outputs the sound data as a retrieving result. To obtain the similarity, for example, correlation factors concerning waveform amplitudes, amplitude average values, power distributions or frequency spectrums, and the like are calculated with respect to the sample waveform and the sound data as a target to be retrieved.
Next, explanation will be made of a encoding apparatus which generates encoded sound data previously recorded in the database 156 shown in FIG. 1. Prior to explanation of the structure of the encoding apparatus, a method of compressing/encoding efficiently sound data will be explained.
Methods of efficiently compressing/encoding sound data can be roughly classified into a band division encoding system and a conversion encoding system. However, there is a system which combines both systems.
In the band division encoding system, a discrete-time waveform signal (e.g., sound data) is divided into a plurality of frequency bands by a band division filter such as a quadrature mirror filter (QMF) or the like, and optimal encoding is performed on each of the bands. This system is also called a sub-band encoding system. Details of the quadrature mirror filter are described in, for example, xe2x80x9cP. L. Chu, xe2x80x9cQuadrature mirror filter design for an arbitrary number of equal bandwidth channelsxe2x80x9d, IEEE Trans. Acoust. Speech, Signal Processing, vol. ASSP-33, pp203-128, February 1985.
The conversion encoding system is also called a block encoding system in which a discrete-time waveform signal is divided into blocks each consisting of a predetermined sample unit, and the signal of this block (called a frame in some cases) is converted into frequency spectrums and is thereafter encoded. The type of the method for thus converting the signal into frequency spectrums is, for example, DFT (Discrete Fourier Transfonn), DCT (Discrete Cosine Transfonn), MDCT (Modified Discrete Cosine Transfonn), or the like. In the MDCT, adjacent blocks on the time axis and converter sections are overlapped on each other, and thus, efficient conversion can be achieved with less block distortion. The details are described in, for example, xe2x80x9cAnalysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellationxe2x80x9d: J. P. Princen, A. B. Bradley, IEEE Transactions, ASSP-34, No. 5, October 1986. pp1153-1161xe2x80x9d, and xe2x80x9cSubband/Transfonn Coding Using Filter Band Design Based on Time Domain Aliasing Cancellationxe2x80x9d: J. J. Princern, A. W. Johnson and A. B. Bradley (ICASSP 1987).
The signal which is divided for every frequency band in the case of the band division encoding system or which is divided into a frequency spectrum in the case of the conversion encoding system is quantized and then encoded. In this manner, the band which causes quantization noise can be restricted with use of an auditory characteristic called a masking effect or the like. In addition, by normalizing each signal before the quantization, effective encoding can be carried out.
For example, if quantization is carried out in the band division encoding system, the signal should be desirably be divided for every bandwidth which is called a critical band.
Bit allocation is performed on each signal thus divided by the frequency bandwidth and thus encoded. For example, if bit allocation is dynamically carried out based on the absolute value of the amplitude of the signal for each band, the quantization noise spectrum is flattened so that the noise energy is minimized. Note that this method is described in, for example, xe2x80x9cAdaptive Transform Coding of Speech Signalsxe2x80x9d: R. Zelinski and P. Noll, IEEE Transactions of Accorstics Speech and signal Processing, vol. ASSP-25, No. 4, August 1997. However, there is a problem that this method is not auditorily most preferred since the masking effect is not used.
In addition, if fixed bit allocation is carried out such that an excellent S/N ratio is obtained for every band, for example, a masking effect can be obtained auditorily. However, in cases where the characteristic of a sine wave is measured, there is a problem that an excellent characteristic value cannot be obtained since bit allocation tin is fixed. Note that this method is described in, for example, xe2x80x9cThe critical band coderdigital encoding of the perceptual requirements of the auditory systemxe2x80x9d: M. A. Kransner, MIT, (ICASSP 1980).
To solve these problems, in a method, all the bits that can be used for bit allocation are divided into dynamic allocation and fixed allocation, and the division ratio is rendered dependent on the input signal such that the rate of the fixed allocation is greater as the spectral distribution of the input signal is smoother, for example, thus achieving efficient encoding.
Meanwhile, in quantization and encoding of sound signals, quantization errors increase in such a waveform that includes a sharp change point of amplitude (hereinafter called an attack) at which the amplitude sharply increases or decreases within a part of a sound waveform increases. Also, in a signal encoded by the conversion encoding system, quantization errors of spectral coefficients at the attack spread over the entire block within a time area during reverse spectral conversion (decoding). Due to influences thereof, auditorily harsh noise called a pre-echo is generated immediately before or after a sharp increase point or sharp decrease point of the amplitude.
To prevent this pre-echo, for example, there is a method (gain control) of previously detecting an attack of a waveform signal and amplifying or damping the gain of the signal before and after the attack, so as to equalize the amplitude of the block in which an attack exists. During encoding according to this method, the position of the gain and information of the level subjected to the gain control are encoded together with the waveform signal subjected to the gain control. During decoding, gain control reversal to that during the encoding is performed, based on the position of the gain and the information of the level subjected to the gain control, and a waveform signal is decoded thereby. Note that this method of performing gain control can be effected for every divided frequency band.
FIG. 2 shows the structure of an encoding apparatus which generates encoded sound data previously recorded on the database 156 shown in FIG. 1. This encoding apparatus compresses and encodes sound data by the conversion encoding system described above.
A spectral converter section 161 converts an inputted sound waveform signal into a spectral coefficient by means of predetermined spectral conversion processing (e.g., DCT) and outputs the coefficient to a quantized section 162. The quantized section 162 normalizes and quantizes the spectral coefficient inputted from the spectral converter section 161 and outputs a quantization spectral coefficient and a quantization parameter (which are a normalization coefficient and a quantization width coefficient) thereby obtained, to a Huffiuan encoder section 163. The Huffinan encoder section 163 performs variable-length encoding on the quantization spectral coefficient and the quantization parameter inputted from the quantization section 162, and outputs the results to a bit-multilayering section 164. The bit-multilayering section 164 multilayers the quantization spectral coefficient and the quantization parameter inputted from the Huffinan encoder section 163, and other encoding parameters into a predetermined bit-stream format.
FIG. 3 shows the structure of the decoder section 154 in FIG. 1. This decoder section 154 decodes encoded sound data generated by the encoding apparatus shown in FIG. 2.
In this encoding section 154, a bit-decomposer section 171 which corresponds to the bit-multilayering section 164 shown in FIG. 2 decomposes inputted encoded sound data into an encoding spectral coefficient and an encoding parameter, and outputs the coefficient and parameter to the Huffinan decoder section 172. The Huffinan decoder section 172 subjects the encoding spectral coefficient and the encoding parameter to decoding which corresponds to the encoding by the Huffinan encoder section 163 in FIG. 2, and outputs a quantization spectral coefficient and a quantization parameter thus obtained, to a reverse quantization section 173. The reverse quantization section 173 reversely quantizes the quantization spectral coefficient and reversely normalizes, and outputs a spectral coefficient thus obtained, to a reverse spectral converter section 174. The reverse spectral converter section 174 performs reverse spectral conversion processing which corresponds to spectral conversion processing by the spectral converter section 161 shown in FIG. 2, on the spectral coefficient inputted from the reverse quantization section 173, and outputs a sound waveform signal thus obtained.
In retrieving by means of the conventional sound data retrieving apparatus described above, it is necessary to decode completely sound data when retrieving compressed and encoded sound data. Therefore, a huge memory capacity is required to record decoded sound data, and besides, an extremely long processing time is required to carry out decoding.
The present invention has been made in view of this situation and has an object of retrieving sound data without completely decoding sound data, when retrieving compressed and encoded sound data.
A first data processing apparatus according to the present invention comprises: sound data input means inputted with sound data; spectral characteristic information detector means for detecting spectral characteristic information from the sound data inputted to the sound data input means; waveform characteristic information detector means for detecting waveform characteristic information within a time area from the sound data inputted to the sound data input means; and recording means for recording the spectral characteristic information detected by the spectral characteristic information detector means and the waveform characteristic information detected by the waveform characteristic information detector means, together with information indicating a correspondence relationship with the sound data inputted to the sound data input means.
A first data processing method according to the present invention comprises: a spectral characteristic detecting step of detecting a spectral characteristic from sound data; a waveform characteristic information detecting step of detecting waveform characteristic information within a time area from the sound data; and a recording step of recording the spectral characteristic information detected in the spectral characteristic information detecting step and the waveform characteristic information detected in the waveform characteristic information detecting step, together with information indicating a correspondence relationship with the sound data.
A first program providing medium for providing a program which makes a computer execute processing, according to the present invention, comprises: a spectral characteristic detecting step of detecting a spectral characteristic from sound data; a waveform characteristic information detecting step of detecting waveform characteristic information within a time area from the sound data; and a recording step of recording the spectral characteristic information detected in the spectral characteristic information detecting step and the waveform characteristic information detected in the waveform characteristic information detecting step, together with information indicating a correspondence relationship with the sound data.
A second data processing apparatus according to the present invention comprises: search condition input means inputted with a search condition for sound data; and search means for searching sound data based on the search condition inputted to the search condition input means, wherein the search means searches sound data which satisfies the search condition by referring to at least spectral characteristic information and waveform characteristic information within a time area, which are previously detected and recorded, from sound data.
A second data processing method according to the present invention comprises: a search condition input step in which a search condition for sound data is inputted; and a search step of searching sound data based on the search condition inputted in the search condition input step, wherein in the search step, sound data which satisfies the search condition is searched by referring to at least spectral characteristic information and waveform characteristic information within a time area, which are previously detected and recorded, from sound data.
A second program providing medium for providing a program which makes a computer execute processing, according to the present invention, comprises: a search condition input step in which a search condition for sound data is inputted; and a search step of referring to at least spectral characteristic information and waveform characteristic information within a time area which are previously detected from sound data and recorded, thereby to search sound data which satisfies the search condition inputted in the search condition input step.
Also, a recording medium according to the present invention is a recording medium on which sound data is recorded and spectral characteristic information detected from the sound data and waveform characteristic information within a time area detected from the sound data are recorded together with information indicating a correspondence relationship with the sound data.
As described above, according to the present invention, it is possible to search sound data efficiently without decoding sound data when searching sound data which is compressed and encoded.