1. Field of the Invention
The present invention relates to an information processing apparatus and method, an information recording apparatus and method, a recording medium, and a distribution medium. More particularly, the present invention relates to an information processing apparatus and method for retrieving compressed and coded audio data on the basis of signal characteristics, an information recording apparatus and method, a recording medium, and a distribution medium.
2. Description of the Related Art
In recent years, with the advancement of low-bit-rate coding technology, it has become common to store audio data and image data in such a way that they are compressed and coded, and a method is required for efficiently retrieving desired data from a large amount of coded data.
FIG. 23 shows the functional construction of a conventional audio data retrieval apparatus. In a database 156 of this audio data retrieval apparatus, a text database for retrieval is recorded in advance in which compressed and coded audio data (hereinafter referred to as xe2x80x9ccoded audio dataxe2x80x9d), and attribute information (for example, title, author name, creation date, classification of the contents, etc.) of the audio data, which is made to correspond to the coded audio data, are written.
A retrieval condition input section 151 accepts an input of retrieval conditions (attribute information, and signal characteristics of a sample waveform) from a usger, supplies the attribute information to an attribute retrieval section 152, and supplies the signal characteristics to a comparison and determination section 155.
The attribute retrieval section 152 retrieves data that matches the attribute information (for example, the author name) input from the retrieval condition input section 151 from the text database for retrieval which is stored in the database 156, extracts coded audio data corresponding thereto, and outputs it to a candidate selection section 153.
The candidate selection section 153 outputs the coded audio data input from the attribute retrieval section 152 in sequence to a decoding section 154. The decoding section 154 decodes the coded audio data input from the candidate selection section 153 and outputs it to the comparison and determination section 155.
The comparison and determination section 155 determines the degree of similarity between the audio data input from the decoding section 154 and the signal characteristics (for example, the waveform amplitude, etc.) of the sample waveform supplied from the retrieval input section. If the degree of similarity is equal to or higher than a predetermined threshold value, the audio data is output as a retrieval result. In order to determine the degree of similarity, for example, a method is available for computing a correlation coefficient of a waveform amplitude, an amplitude average value, a power distribution, a frequency spectrum, etc., of a sample waveform and of that of retrieved audio data.
Next, a description is given of a coding apparatus for creating coded audio data which is prerecorded in the database 156 of FIG. 23. Before that, a method for efficiently compressing and coding audio data is described. A method for efficiently compressing and coding audio data can be broadly classified into a band division coding method and a transform coding method. There is also a method in which both are combined.
The band division coding method is a method in which a discrete time waveform signal (for example, audio data) is divided into a plurality of frequency bands by a band division filter, such as a quadrature mirror filter QMF, and the most appropriate coding is performed for each band. This is also called xe2x80x9csubband codingxe2x80x9d. The details of the quadrature mirror filter are described in, for example, P.L. Chu, xe2x80x9cQuadrature mirror filter design for an arbitrary number of equal bandwidth channelsxe2x80x9d, IEEE Trans. Acoust. Speech, Signal Processing, vol. ASSP-33, pp. 203-128, February 1985.
The transform coding method is also called a xe2x80x9cblock coding methodxe2x80x9d, which is a method in which a discrete time waveform signal is divided into blocks in predetermined sampling units, a signal of this block (referred to also as a xe2x80x9cframexe2x80x9d) is converted into a frequency spectrum, and this is then coded. Examples of types of methods for conversion into a frequency spectrum include discrete Fourier transform DFT, discrete cosine transform DCT, and modified discrete cosine transform MDCT. The modified discrete cosine transform is able to perform efficient conversion with small block distortion by causing adjacent blocks on the time axis and the conversion sections to be superposed on each other. The details thereof are described in, for example, xe2x80x9cAnalysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellationxe2x80x9d: J. P. Princen, A. B. Bradley, IEEE Transactions, ASSP-34, No. 5, Oct. 1986, pp. 1153-1161, and xe2x80x9cSubband/Transform Coding Using Filter Band Design Based on Time Domain Aliasing Cancellationxe2x80x9d: J. J. Princern, A. W. Johnson and A. B. Bradley (ICASSP 1987).
In the band division coding method, a signal which is divided for each frequency band is coded after being quantized, whereas in the transform coding method, a signal which is converted into a frequency spectrum is coded after being quantized, thereby making it possible to limit a band in which quantization noise occurs by using auditory properties, such as what is commonly called the xe2x80x9cmasking effectxe2x80x9d. Also, before this quantization, by normalizing each signal, efficient coding can be performed.
For example, when quantization is to be performed in the band division coding method, it is preferable that, by considering the auditory characteristics of a human being, a band division width be divided in a band width called a xe2x80x9ccritical bandxe2x80x9d such that the higher the frequency regions, the wider the band width.
The signal which is divided into frequency bands is allocated with a bit (bit allocation) for each band and is coded. For example, if bit allocation is performed dynamically on the basis of the amplitude absolute value of a signal for each band, the quantized noise spectrum becomes flat, and the noise energy becomes minimal. This method is described in, for example, xe2x80x9cAdaptive Transform Coding of Speech Signalsxe2x80x9d: R. Zelinski and P. Noll, IEEE Transactions of Acoustics Speech and Signal Processing, vol. ASSP-25, No. Aug. 4, 1997. However, in this method, a masking effect is not used, resulting in a problem in that this method is not the most appropriate from an auditory point of view.
Also, for example, if fixed bit allocation is performed so that satisfactory S/N is obtained for each band, a masking effect is obtained from an auditory point of view. However, for example, when characteristics of a sine wave are to be measured, there is a problem in that since bit allocation is fixed, a satisfactory characteristic value cannot be obtained. This method is described in xe2x80x9cThecritical band coder-digital encoding of the perceptual requirements of the auditory systemxe2x80x9d: M. A. Kransner, MIT, (ICASSP 1980).
In order to solve these problems, there is also a method in which all bits which can be used for bit allocation are classified into dynamic allocation portions and fixed allocation portions, and the division ratio is made to depend on an input signal so that the more smooth the spectrum distribution of the input signal, the larger the ratio of the fixed allocation portions.
In the quantization and coding of an audio signal, in a waveform in which a point of sudden change in amplitude (hereinafter referred to as an xe2x80x9cattackxe2x80x9d) is present such that the amplitude increases or decreases suddenly in a part of the audio waveform, a quantization error increases in an attack. Also, in a signal coded by a transform coding method, a quantization error of a spectrum coefficient in an attack is spread over the entire block on a time area during inverse spectrum transform (during decoding). As a result of this influence, noise which is commonly called a xe2x80x9cpre-echoxe2x80x9d, which is unpleasant to listen to, occurs immediately before and after the point of sudden increase or decrease in amplitude.
In order to prevent this pre-echo, for example, there is a method (gain control) in which an attack of a waveform signal is detected in advance, and the gain of signals before and after the attack is amplified or attenuated so that the amplitudes of blocks in which the attack is present are made uniform. During the coding of this method, the information of the position of the gain and the gain-controlled level is coded together with the waveform signal in which gain control is performed. Also, during decoding time, the waveform signal is decoded by performing gain control inverse to that during coding time on the basis of the information of the position of the gain and the gain-controlled level. This method for performing gain control may be performed for each divided frequency band.
FIG. 24 shows the construction of a coding apparatus which creates coded audio data which is prerecorded in the database 156 of FIG. 23. This coding apparatus compresses and codes audio data by the above-described transform coding method.
A spectrum transformation section 161 converts an input audio waveform signal into a spectrum coefficient by a predetermined spectrum transform process (for example, a discrete cosine transform process) and outputs it to a quantization section 162. The quantization section 162 performs normalization and quantization on a spectrum coefficient input from the spectrum transformation section 161, and outputs the obtained quantized spectrum coefficient and the quantization parameter (normalization coefficient and quantization width coefficient) to a Huffman coding section 163. The Huffman coding section 163 converts the quantized spectrum coefficient and the quantization parameter input from the quantization section 162 into a variable length code and outputs them to a bit multiplexing section 164. The bit multiplexing section 164 multiplexes the coded quantized spectrum coefficient and the quantization parameter input from the Huffman coding section 163, and other coding parameters into a predetermined bit stream format, and outputs it.
FIG. 25 shows the construction of the decoding section 154 of FIG. 23, which decodes the coded audio data created by the coding apparatus of FIG. 24. A bit decomposition section 171 corresponding to the bit multiplexing section 164 of FIG. 24 decomposes the input coded audio data into a coded spectrum coefficient and a coding parameter and outputs them to a Huffman decoding section 172. The Huffman decoding section 172 performs decoding corresponding to the coding of the Huffman coding section 163 of FIG. 24 on the coded spectrum coefficient and the coding parameter and outputs the obtained quantized spectrum coefficient and the quantization parameter to an inverse quantization section 173. The inverse quantization section 173 inversely quantizes the quantized spectrum coefficient on the basis of the quantization parameter so that it is inversely normalized, and outputs the obtained spectrum coefficient to an inverse spectrum transformation section 174. The inverse spectrum transformation section 174 performs an inverse spectrum transformation process corresponding to the spectrum transformation process of the spectrum transformation section 161 of FIG. 24 on the spectrum coefficient input from the inverse quantization section 173, and outputs the obtained audio waveform signal.
In the retrieval by the above-described conventional audio data retrieval apparatus, in order to retrieve compressed and coded audio data, the compressed and coded audio data must be decoded completely, thereby resulting in problems in that enormous amounts of memory are necessary for storing the decoded information and very long processing times are required for decoding.
The present invention has been achieved in view of such circumstances. An object of the present invention is to make it possible to efficiently retrieve audiovisual (AV) data by decoding a part of AV data which is coded in such a manner as to correspond to retrieval conditions.
To achieve the above-mentioned object, according to a first aspect of the present invention, there is provided an information processing apparatus comprising: accepting means for accepting a retrieval condition; decoding means for decoding a part of the AV data which is coded in such a manner as to correspond to the retrieval condition accepted by the accepting means; computation means for computing a correlation coefficient of the retrieval condition accepted by the accepting means and the AV data decoded by the decoding means; comparison means for comparing the correlation coefficient computed by the computation means with a predetermined threshold value; and incrementing means for incrementing the retrieval condition or the threshold value.
According to a second aspect of the present invention, there is provided an information processing method comprising: an accepting step for accepting a retrieval condition; a decoding step for decoding a part of the AV data which is coded in such a manner as to correspond to the retrieval condition accepted in the accepting step; a computing step for computing a correlation coefficient of the retrieval condition accepted in the accepting step and the AV data decoded in the decoding step; a comparing step for comparing the correlation coefficient computed in the computation step with a predetermined threshold value; and an incrementing step for incrementing the retrieval condition or the threshold value.
According to a third aspect of the present invention, there is provided a distribution medium which distributes a computer-readable program to an information processing apparatus in order to execute a process, the process comprising: an accepting step for accepting a retrieval condition; a decoding step for decoding a part of the AV data which is coded in such a manner as to correspond to the retrieval condition accepted in the accepting step; a computing step for computing a correlation coefficient of the retrieval condition accepted in the accepting step and the AV data decoded in the decoding step; a comparing step for comparing the correlation coefficient computed in the computation step with a predetermined threshold value; and an incrementing step for incrementing the retrieval condition or the threshold value.
According to a fourth aspect of the present invention, there is provided an information processing apparatus comprising: accepting means for accepting a retrieval condition; extracting means for extracting a part of signal characteristics from AV data in which signal characteristics are hierarchically recorded in such a manner as to correspond to the retrieval condition accepted by the accepting means; computation means for computing a correlation coefficient of the retrieval condition accepted by the accepting means and the signal characteristics extracted by the extracting means; comparison means for comparing the correlation coefficient computed by the computation means with a predetermined threshold value; and incrementing means for incrementing the retrieval condition or the threshold value.
According to a fifth aspect of the present invention, there is provided an information processing method comprising: an accepting step for accepting a retrieval condition; an extracting step for extracting a part of signal characteristics from AV data in which the signal characteristics are hierarchically recorded in such a manner as to correspond to the retrieval condition accepted in the accepting step; a computing step for computing a correlation coefficient of the retrieval condition accepted in the accepting step and the signal characteristics extracted in the extracting step; a comparing step for comparing the correlation coefficient computed in the computation step with a predetermined threshold value; and an incrementing step for incrementing the retrieval condition or the threshold value.
According to a sixth aspect of the present invention, there is provided a distribution medium which distributes a computer-readable program to an information processing apparatus in order to execute a process, the process comprising: an accepting step for accepting a retrieval condition; an extracting step for extracting a part of signal characteristics from AV data in which the signal characteristics are hierarchically recorded in such a manner as to correspond to the retrieval condition accepted in the accepting step; a computing step for computing a correlation coefficient of the retrieval condition accepted in the accepting step and the signal characteristics extracted in the extracting step; a comparing step for comparing the correlation coefficient computed in the computation step with a predetermined threshold value; and an incrementing step for incrementing the retrieval condition or the threshold value.
According to a seventh aspect of the present invention, there is provided an information recording apparatus comprising: detection means for detecting a signal characteristic of input AV data; and recording means for hierarchically recording the signal characteristic detected by the detection means.
According to an eighth aspect of the present invention, there is provided an information recording method comprising: a detecting step for detecting a signal characteristic of input AV data; and a recording step for hierarchically recording the signal characteristic detected in the detecting step.
According to a ninth aspect of the present invention, there is provided a distribution medium which distributes a computer-readable program to an information processing apparatus in order to execute a process, the process comprising: a detecting step for detecting a signal characteristic of input AV data; and a recording step for hierarchically recording the signal characteristic detected in the detecting step.
According to a tenth aspect of the present invention, there is provided a recording medium in which AV data having signal characteristics hierarchically formed therein is recorded.
In the information processing apparatus, the information processing method, and the distribution medium in accordance with the present invention, a retrieval condition is accepted, a part of AV data which is coded in such a manner as to correspond to the accepted retrieval condition is decoded, a correlation coefficient of the accepted retrieval condition and the decoded AV data is computed, and the correlation coefficient is compared with a predetermined threshold value. Furthermore, the retrieval condition or the threshold value is incremented.
In the information processing apparatus, the information processing method, and the distribution medium in accordance with the present invention, a retrieval condition is accepted, a part of the signal characteristics are extracted from AV data in which signal characteristics are hierarchically recorded in such a manner as to correspond to the accepted retrieval condition, a correlation coefficient of the accepted retrieval condition and the extracted signal characteristics is computed, and the correlation coefficient is compared with a predetermined threshold value. Furthermore, the retrieval condition or the threshold value is incremented.
In the information recording apparatus, the information recording method, and the distribution medium in accordance with the present invention, signal characteristics of the input AV data are detected, and the detected signal characteristics are hierarchically recorded.
In the recording medium in accordance with the present invention, AV data having signal characteristics hierarchically formed therein is recorded.
The above and further objects, aspects and novel features of the invention will become more apparent from the following detailed description when read in connection with the accompanying drawings.