Currently, various media programs are broadcasted over television and radio network. The term “media program” as used herein may refer to a television (TV) program, radio program, etc. containing an audio signal. Examples of media program include product advertisement, weather forecast and news reporting. Such programs typically contain some broadcast segments. For example, the product advertisement program includes various product advertisements, wherein one broadcast segment corresponds to a piece of advertisement. Typically, these broadcast segments are very fleeting (about 30-60 second/clip) and supply only concise introductions. In many occasions, the viewers are not satisfied with the short information offered by the broadcasted program and hope to obtain more related information.
For example, an audience suddenly hears a piece of news that catches his attention and wishes to acquire the detailed information about it when he is watching “News Reporting” program on TV. However, the television program he is watching presents only a brief summary about the news. At this time, the audience could call the television station to query about this news, or spend time searching the background information at Internet, but both are too cumbersome.
Considering the ever-increasing popularity of mobile devices, it will be convenient if the audience could push a few buttons on his mobile device towards TV and receive the detailed information about the news on his phone or via his default email address a few seconds later. The term “mobile device” as used herein covers various portable terminals equipped with audio recording means (such as microphone), such as cellular phones and Personal Digital Assistants (PDA), etc.
A key aspect of the above scenario is identification of a media program containing an audio signal. There are a number of methods for identifying a media program containing an audio signal in the prior art. One possible approach to the identification of broadcast segments containing audio signal involves audio fingerprinting, in which each segment should be analyzed before broadcast to form its “fingerprint”. In recognition phase, the decoder attempts to analyze the characteristics of a segment being broadcast and match it to one of the fingerprints, i.e., recognize its pattern. This approach uses relatively complicated technology and is cumbersome to implement because it needs to update the patterns for recognizing new broadcast segment. In particular, the approach cannot be applied in live broadcast case due to the unavailability of corresponding patterns.
Another identification approach involves audio watermarking. Technically speaking, digital audio watermarking is a technique of hiding secret signals into host signals in an imperceptive way. The secret signals cannot be removed through standard processing, transmission, and/or recording of the host data, and can be extracted by appropriately designed watermark detectors. In the prior art, there are some related inventions discussing media program identification thru audio watermarking. For example, in U.S. Pat. No. 5,848,155 to Cox entitled “Spread Spectrum Watermark for Embedded Signaling”, a watermark is embedded into audio/image/video/multimedia data by using spread spectrum technology. U.S. Pat. No. 6,792,542 B1 to Lee et al. entitled “Digital System for Embedding a Pseudo-randomly Modulated Auxiliary Data Sequence in Digital Samples” discloses a scheme of embedding auxiliary digital information by employing a pseudo-random sequence to modulate the Least Perceptually Significant Bits (LPSBs) of successive multi-bit samples of the host signal. U.S. Pat. No. 5,893,067 to Bender et al. entitled “Method and Apparatus for Echo Data Hiding in Audio Signals” embeds one or more echoes into the host audio signal. U.S. Pat. No. 5,581,800 to Fardeau et al. entitled “Method And Apparatus for Automatically Identifying a Program Including a Sound Signal” discloses a method for encoding message in the sound signal by altering the energy of some frequency components in a characteristic manner that is predetermined and repeated. Besides, U.S. Patent Application Publication No. US 2003/0172277 A1 to Yoiti Suzuki et al. entitled “Digital Watermark System” discloses a digital watermark embedding method of inserting the generated echo signal in the original audio signal by spreading the echo signal on the time axis.
Spread Spectrum from Cox's patent modulates the hiding data into a set of pseudo random sequences that are embedded in host audio signal. This method has the advantages of easy implementation, good security, robustness to various attacks, etc. But there is a fatal drawback for the Spread Spectrum method that encumbers its practical application, i.e., it is vulnerable to the desynchronization attack in watermark detection. Echo Hiding from Bender's scheme embeds hiding data into host signal by introducing an echo in time domain. It is widely adopted since it has many remarkable features, such as high immunity of synchronization attack, self-sufficient blind detection, and little noticeable noise. However, Echo Hiding also has serious disadvantages of low capacity and lenient decoding process.
Albeit by combining Spread Spectrum and Least Perceptually Significant Bit techniques for improving the imperceptibility of the watermarked signal, Lee's scheme is unsatisfactory because it is vulnerable to the environmental noise. In particular, it employs a check code to meet the requirement of self-synchronization, i.e., the decoder is synchronized when the received watermarks plus check code match up the computed check code. This process needs to search exhaustingly the synchronization sample by sample. Fardeau's scheme requires a specialized pager-like equipment to detect the embedded identification message. Additionally, note that the frequency components selected for encoding the sound signal are chosen to lie in the range near 100 Hz so that it may suffer from the low pass filtering attack that is a common preprocess operation for various audio compression algorithms. Yoiti's method combines Echo Hiding and Spread Spectrum for improving the capacity and security compared with the conventional techniques. However, considering the downsample attack existing in media interaction scenarios, the length of the allowed embedding echo array is limited so that it can not provide the desired PN sequence long enough to guarantee the good statistical property. On the other hand, this method is vulnerable to the echo jitter attack.
Therefore, the prior art fails to provide an effective method and apparatus for identifying a media program based on audio watermarking so as to obtain the related information about the media program.
In view of watermarking technology, there are some typical attacks to audio watermarking system in the scenario of media and mobile device interaction. These attacks include randomly cropping, AD/DA conversion, resampling, audio compression, environment noise, reverberation, etc. For watermaking systems under the background of the present invention, randomly cropping, AD/DA conversion and resampling are most serious attacks. It is due to the following reasons:
1) Audiences can record randomly the watermarked audio clip with the length of only several seconds that is a small portion of host signal;
2) The capture of encoded audio is handled by recording at mobile device side in an analog manner while the watermark embedding is performed in a digital manner; and
3) The watermark embedding must work on 44.1K sample rate to ensure the quality of host signal, while the mobile device allows recording at lower (such as 8K) sample rate only.
Accordingly, there exists a need for a method and an apparatus for identifying a media program based on audio watermarking so as to obtain related information about the media program, which enables convenient acquirement of related information about the media program, has no effect on the quality of the media program, and is able to resist various environment attacks.