In recent years, technologies for recognizing and understanding images and sounds have been developed, and a variety of applications of consumer appliances have used such various technologies for recognizing and understanding images and sounds. As an example, ACR (Automatic Content Recognition) constructed in a client server system using the Internet is exemplified, which is called an ACR service, and various applications have been proposed.
As one application of the ACR service, content right restriction such as detecting illegal copies can be exemplified. When content is distributed through the Internet, for example, the content is identified so that whether the content is legitimate is checked using such a recognition and understanding technology. Recently, according to the proliferation of file sharing services, a need to more accurately identify content using images and sounds has increased.
In addition, as another application of the ACR service, a service of providing users who view content with added values can be exemplified. For example, broadcasted or distributed content is identified and information relating to the content is provided through the Internet so as to be synchronized with viewing of the content. The CDDB music identification service provided by Gracenote of the US (Gracenote, Inc., Berkeley, Calif.) identifies a compact disc (CD) and provides access to information relating to the identified CD (the album title, artist name, track list, relevant content on the Internet (the album cover, artist, fan site), and the like).
In the ACR service, a watermark (electronic watermark) or a fingerprint (feature point information) is extracted from content and the content is identified in, for example, the final stage in which the content is decoded and displayed, and the service does not rely on a delivery chain of the content.
When content that includes video information and audio information of a broadcasting program or the like is identified, a method of identifying the content using only one of the video information and the audio information and a method of identifying the content using both pieces of the information are considered.
For example, as a method of identifying content using only audio information, a method has been proposed in which one or more segments of a waveform that has been digitally sampled are used to form an amplitude signature of the waveform by counting the number of times of occurrence in the segment of the waveform in each of a plurality of amplitude bands or slots, fuzzy comparison with amplitude signatures in a database is executed, and when one or more potential matching cases are found, more precise comparison is executed, and thereby matching of waveforms is found in a recorded database indicating the waveforms (for example, refer to Patent Literature 1).
Whether only one of video information and audio information is used or both pieces of the information are used when content should be identified is defined as a part of service specifications or application specifications during system designing according to an intended application of the ACR service, an index of a content recognition rate, system design and restriction on operation costs, and the like.
At present, realizing the ACR service using only audio information is considered to be most advantageous to lessening loads on an amount of information and a number of processes to be dealt with in light of system design and operation costs. This is because there are many cases in which such audio information may have a smaller data amount of feature point information for identifying content than video information.
However, as there are a larger number of pieces of content to be dealt with, a data amount of feature point information to be prepared on a server side of the ACR service increases even though a data amount of feature point information of each piece of content is small. An increasing physical capacity of a database in the ACR service that deals with audio information is a challenging task in terms of system design and operations.
With regard to the task in terms of system design, if the number of pieces of content to be dealt with increases, a capacity of a database in which feature point information for identifying the content is stored increases, and a process for the identification and a time taken in the identification increase as well. In addition, with regard to the task in terms of a system operation, if the number of pieces of content to be dealt with increases, a capacity of a database in which feature point information for identifying the content is stored increases, and investment in facilities for preparing physical databases, maintenance expenses and the like increase. The present inventors consider it necessary to take measures for such tasks even when only audio information is used.