The present invention relates to a method and a system for embedding additional information, such as copyright information, in digital audio data, and for detecting the embedded information. In particular, the present invention pertains to a method and a system for precisely detecting additional information that has been embedded when the audio data in which the additional information has been embedded are so transformed that the sonic quality of the audio data is not drastically deteriorated.
At present, digital music is provided not only on CDs but also across the Internet. Such digital audio data are so stable that regardless of how often the data are played, the sonic quality is not deteriorated. However, since large quantities of the data can be copied easily, techniques for preventing the illegal copying of data have become ever more important. To prevent such illegal copying, copyright information can be embedded in audio data to clearly evidence the existence of a copyright, or information concerning a distribution destination can be embedded so that the route along which an illegal copy travels can be traced. In order not to reduce the value of the audio components, changes in sonic quality due to the embedding of information must not be aurally perceivable by humans. Furthermore, since processing such as filtering, the compression/decompression performed with MPEG, AC3 or ATRAC, digital to analog and analog to digital conversions, trimming, and the changing of the replay speed may be performed using the digital components, the embedded information must survive even through the changing, the loss, the insertion or the re-sampling of data values that occurs in a range wherein the sonic quality of audio data is not drastically deteriorated.
The conventional detection method for embedding additional information in audio data is superior in the maintenance of secrecy. However, when additional information is embedded with less modification so that it is imperceptible to human beings, the embedded information could be lost during data processing activities, such as compression/decompression, filtering and digital to analog conversion, so that the conventional method has a problem as for robustness. The ordinary method used for embedding additional information in audio data is a spread spectrum method that uses PN (Pseudo-random Noise) modulation. According to this method, in the time domain, additional information is modulated using pseudo-random noise and the resultant information is embedded. As a result, in the frequency domain, the spectrum of a component that corresponds to the embedded information appears to spread. This method is disclosed in U.S. Pat. Nos. 4,979,210, 5,073,925 and 5,319,735.
According to the above methods, bit data Bm is modulated using pseudo-random numbers Rn consisting of +1 and 1 generated by a suitable encryption technique (e.g., DES), and the resultant bit information is embedded in audio samples An as follows.
Axe2x80x2Nm+n=ANm+n+cBmRNm+nxe2x80x83xe2x80x83Expression 1
wherein Bm denotes +1 or 1 that represents one bit, n=0, 1, . . . , N=1, and c denotes the strength for embedding information. For the detection of the embedded information, bit information is detected by calculating                               B          m                =                              1                                                           c                            ⁢              N                                ⁢                                    ∑                              n                =                0                                            N                -                1                                      ⁢                          xe2x80x83                        ⁢                                          A                                  Nm                  +                  n                                xe2x80x2                            ⁢                              R                                  Nm                  +                  n                                                                                        Expression  2            
This is because it can be expected that if sequence Rn is random, the values xcexa3ANm+nRMn+n will cancel each other out. To embed information at time domain, the perception frequency response of a human being can not be utilized and the deterioration of the sonic quality can not be prevented. Thus, if information is embedded with small modification level so that human beings can not aurally perceive the modification in the sonic quality due to the embedding, the embedded information will not survive the execution of some post-processes, such as compression/decompression.
On the other hand, according to the technique in U.S. Pat. No. 5,687,191, in the embedding information process, samples constituting time signals are divided by a polyphase filter to obtain frequency bands, and in each frequency band, information is modulated with pseudo-random noise and the information is embedded. The advantages conferred by this are that different embedding strengths can be employed for the individual frequency bands, and that the frequency property of a human being""s aural perception can be utilized. Therefore, with this method, unlike with the other conventional methods, information can be embedded that can be robust and that does not deteriorate the sonic quality.
According to the methods disclosed in U.S. Pat. Nos. 5,613,004 and 5,687,236, as well as in the present invention, information is embedded in the frequency component that is transformed, and thereafter is detected. With these methods, in order to improve secrecy, the embedding and detection means in a frequency domain is proposed as a signal spreading means. However, in these USPs, the embedding and detection method that has a high sonic quality and is robust is not proposed. With these methods, uncompressed digital audio samples are divided into areas that do not overlap each other (called windows), and an FFT (Fast Fourier Transform) is applied for each of the individual windows. A primary mask and a convolutional mask are employed to determine whether a one bit signal should be embedded in a frequency component obtained by means of the FFT. The primary mask and the convolutional mask are pseudo-random bits, and the size of the primary mask corresponds to the value of the frequency. Each window corresponds to a specific position in the convolutional mask. Whether information is to be embedded in the frequency components of the windows is determined depending on whether the result of the logical calculation of the bit value of the primary mask, at the position corresponding to the frequency, and the bit value of the convolutional mask, at the position corresponding to the window, is true or false.
According to the embedding method disclosed in U.S. Pat. No. 5,613,004, map information bits (redundant bits produced from additional information) are embedded in specific bit positions of the embedding frequency component. In U.S. Pat. No. 5,687,236, bits are embedded by modifying them so that they fall in levels that are determined in advance relative to the original value. In either case, one bit is embedded in one frequency component, and the secrecy of the embedded information can be maintained by using the primary mask and the convolution mask. However, the embedded information can not survive the performance of data processing, such as compression/decompression and the addition of random noise to each frequency component.
The delimiter of the message start is a sign consisting of relatively many bits, and is used to detect the delimiting of windows in detected bits and a message start point. According to the specifications for these patents, 64 bits are embedded in one window of 128 samples, and a 1024 bit sign is obtained for 16 windows. Since it is highly improbable these signs will be identical, a specific 1024 bit sign can be employed as the delimiter of the message start. To search for the delimiter of the windows and the message start point, the window start point is shifted one sample at a time until the delimiter of the message start is detected. With this method, if the embedded information is long, the load imposed by the search for the message start point will be increased, and this method can not cope with a request for re-synchronization due to the loss or the insertion of data that frequently occurs during a digital to analog conversion.
It is, therefore, one object of the present invention to provide a method and a system for embedding additional information, such as copyright information, in audio data, so that a modification in the sonic quality due to the embedding is imperceptible to human beings, and for maintaining the embedded information and precisely detecting it after audio data processing has been performed at a level whereat the sonic quality was not drastically deteriorated.
It is one more object of the present invention to provide a method and a system whereby additional information can be embedded in audio data and can be detected, while a high sonic quality and high robustness are maintained.
It is another object of the present invention to provide a method and a system for transforming samples of audio data into frequency components, and for manipulating the obtained data in the frequency domain to embed additional information.
It is an additional object of the present invention to provide a method and a system, for embedding additional information in audio data, that is robust against destruction during the data processing, such as the compression/decompression of data and the addition of random noise to individual frequency components.
It is a further object of the present invention to provide a method and a system for embedding additional information in audio data and detecting the embedded information, and for reducing the load imposed by the search for the embedded additional information.
It is one further object of the present invention to provide a method and a system for embedding additional information in audio data and for detecting the embedded information, and for coping with a request for re-synchronization due to the loss or the insertion of data that occurs frequently in digital to analog conversion.
To achieve the above objects, according to one aspect of the present invention, provided are an xe2x80x9cembedding system,xe2x80x9d for embedding additional information, such as copyright information, in uncompressed digital audio data, so that a change in sonic quality is imperceptible to human beings; and a xe2x80x9cdetection system,xe2x80x9d for, even when data compression/decompression or a trimming procedure has been performed for the audio data, determining whether additional information has been embedded and detecting embedded information.
Uncompressed digital audio data for each channel is constituted by a sequence of integers, called a sample. For the audio data provided by CDs, each channel is constituted by 44,100 16-bit samples per second. According to this invention, information is embedded and is detected at the frequency domain, so that a psychoacoustic model can be employed. Therefore, in the embedding system and the detection system of the present invention, the audio samples are divided into pieces having a constant length, and individual delimited samples pieces are transformed in the frequency domain. Each interval of the pieces of samples to be transformed is called a frame.
The additional information embedding processing according to the present invention is shown in FIG. 4. At step 410 a mask that defines for each frequency the phase of an embedding signal is employed to embed bit information corresponding to the additional information and a synchronization signal in the frequency component obtained from the individual frames of transformed audio data in a frequency domain, and the audio data in the frequency domain is invertedly transformed into the audio samples in time domain. In the embedding process, the frames do not overlap each other, and successive frames may not be adjacent. When the trimming robustness is required, additional information is repetitively embedded.
The additional information detection processing is shown in FIG. 5. At step 510 the searching of audio data samples is performed to find a start point of a frame. At step 520, when it is determined that additional information has been embedded, a detection mask is employed to detect a bit embedded in a frequency component. At step 530 a point whereat the cycle of additional information repetitively embedded has been started is searched for, and the embedded additional information is reproduced.
According to the present invention, provided are a method and a system for embedding, in each frame, information that is aurally imperceptible to human beings but is robust, and for detecting the embedded information; a frame synchronization method and system for searching for the correct frame start and end point before embedded information is detected; and a message synchronization method and system for searching for the bit cycle start and end points in order to reproduce bits (message) by using the bit information detected in each frame.
Specifically, to embed additional information in audio data, first the audio data are transformed into frequency components. Based on the audio data, the level of modification for each frequency component is determined to be one in which additional information can be embedded in the audio data, and a mask used for embedding additional information is generated. Then, the additional information is embedded, using that mask, within the modification level of the frequency component obtained from the transformed audio data. Finally, the transformed audio data in which the additional information is embedded are invertedly transformed into audio data in time domain. To detect additional information embedded in audio data, synchronization detection means is provided for transforming the audio data into frequency components, for producing a mask used for the detection of additional information and for obtaining synchronization for the detection of additional information. Then, using this mask the additional information in the transformed audio data is synchronously detected from the transformed audio data.