1. Field of the Invention
This invention relates to a method and an apparatus for efficiently classifying pieces of multimedia information such as video signals and audio signals, to a method and an apparatus for generating descriptors (tags) corresponding to the classification and also to a method and an apparatus for retrieving input signals according to the result of the classification or the generated descriptors.
2. Related Background Art
It has been widely recognized that, in order to handle multimedia information such as video signals and audio signals, it is necessary to classify video signals and audio signals according to their contents and put an attribute information (tag) to each signal according to the contents of the signal.
Now, known techniques of classifying signals according to the contents will be briefly discussed in term of audio signals that are popularly used for multimedia information.
Generally, an audio signal comprises sounded spans where sounds exist and soundless spans where no sound exists. Thus, many known techniques adapted to classify the attributes of audio signals that can incessantly change are designed to detect the soundless spans of audio signals. The signal whose soundless spans are detected is tagged to show its soundless spans. Then, the subsequent signal processing operation will be so controlled that the operation is suspended for the soundless spans indicated by the tag.
Meanwhile, Japanese Patent Application Laid-Open No. 10-207491 discloses an audio signal classifying technique that consists in classifying sounds into background sounds and front sounds. With the technique as disclosed in the above patent document, the power and the spectrum of the background sound is estimated and compared with the power and the spectrum of the input signal to isolate background sound spans from front sound spans.
While the technique as disclosed in the above patent document is effective when the input signal is a voice signal and the background sound is a relatively constant and sustained sound, it can no longer correctly classify input signals if they includes ordinary audio signals such as those of music and acoustic signals.
Japanese Patent Application Laid-Open No. 10-187128 discloses a technique of video signal classifying technique of determining the type of picture of the input signals that include auxiliary audio signals such as voice signals and/or music signals on the basis of the sound information accompanying the video information. Thus, with this technique, it is possible to classify audio signals such as voice signals and music signals. According to the disclosed technique, firstly signals showing a predetermined spectrum structure are classified as music signals and removed from the input signals. Then signals showing another spectrum structure are classified as voice signals and removed from the remaining signals
However, since the technique disclosed in the above patent document regards only spans where the line spectrum structure constantly continues as music signals, it cannot reliably be applied to music signals that contains signals for sounds of percussion instruments and those of a song.