This invention relates generally to signal processing systems, and more particularly to a signal processing system for providing a digital watermark in an audio signal.
With the advent of computer networks and digital multimedia, protection of intellectual property has become a prime concern for creators and publishers of digitized copies of copyrightable works, such as musical recordings, movies, and video games. Once method of protecting copyrights in the digital domain is to use digital xe2x80x9cwatermarks.xe2x80x9d Digital watermarks can be used to mark each individual copy of a digitized work with information identifying, inter alia, the title, copyright holder, and even the licensed owner of a particular copy. Watermarks can also serve to allow for secured metering and support of other distribution systems of a given media content. In theory, almost any item of information could be encoded and used as a watermark.
Digital watermarks are created by encoding a data signal, hereinafter referred to as the xe2x80x9cwatermark signal,xe2x80x9d xe2x80x9cwatermark data,xe2x80x9d or simply xe2x80x9cwatermarkxe2x80x9d, which is then integrated into a larger content signal, hereinafter referred to as the xe2x80x9caudio signalxe2x80x9d, to create a composite signal. Ideally, the composite signal should contain minimal or no perceptible artifacts of the watermark.
It is known in the art that every audio signal generates a perceptual concealment function which masks audio distortions existing simultaneously with the signal. Accordingly, any distortion, or noise, introduced into the transmission channel if properly distributed or shaped, will be masked by the audio signal itself. Such masking may be partial or complete, leading either to increased quality compared to a system without noise shaping, or to near-perfect signal quality that is equivalent to a signal without noise. In either case, such xe2x80x9cmasking xe2x80x9d occurs as a result of the inability of the human perceptual mechanism to distinguish between two signal components, one belonging to the audio signal and the other belonging to the noise, in the same spectral, temporal or spatial locality. An important effect of this limitation is that the perceptibility of the noise by a listener can be zero, even if the signal-to-noise ratio is at a measurable level. Ideally, the noise level at all points in the audio signal space is exactly at the level of just-noticeable distortion, which limit is typically referred to as the xe2x80x9cperceptual entropy envelopexe2x80x9d or xe2x80x9cPEExe2x80x9d.
Hence, the main goal of noise shaping is to minimize the perceptibility of distortions by advantageously shaping it in time or frequency so that as many of its components as possible are masked by the audio signal itself. See Nikil Jayant et al., Signal Compression Based on Models of Human Perception, 81 Proc. of the IEEE 1385 (1993).
xe2x80x9cPerceptual codingxe2x80x9d techniques employing the above-discussed principles are presently used in signal compression and are based on three types of masking: frequency domain, time domain and noise level. The basic principle of frequency domain masking is that when certain strong signals are present in the audio band, other lower level signals, close in frequency to the stronger signals, are masked and not perceived by a listener. Time domain masking is based on the fact that certain types of noise and tones are not perceptible immediately before and after a larger signal transient. Noise masking takes advantage of the fact that a relatively high broadband noise level is not perceptible if it occurs simultaneously with various types of stronger signals.
Perceptual coding forms the basis for precision audio sub-band coding (PASC), as well as other coding techniques used in compressing audio signals for mini-disc (MD) and digital compact cassette (DCC) formats. Specifically, such compression algorithms take advantage of the fact that certain signals in an audio channel will be masked by other stronger signals to remove those masked signals in order to be able to compress the remaining signal into a lower bit-rate channel.
One of the deficiencies of conventional systems for adding a watermark to an audio signal is that the watermark is encoded on a single frequency band or channel, such that opportunities for inserting the watermark such that it is masked by the PEE of the audio signal are limited. In addition, there exists no option to provide redundancy; that is, the entire watermark is included only once in the audio signal, such that if any part of it is damaged, it is difficult, if not impossible, to recover. Finally, there is no way to xe2x80x9cforcexe2x80x9d an opportunity such that a minimum time between transmissions of the watermark data can be enforced or to xe2x80x9ccreatexe2x80x9d an opportunity where one almost exists by changing the gain of the audio signal.
Therefore, what is needed is an improved system for providing a digital watermark in an audio signal.
The foregoing problems are solved and a technical advance is achieved by a computer-implemented system for providing a digital watermark in an audio signal. In a preferred embodiment, a audio file, such as a .WAV file, containing an audio signal to be watermarked is processed by an encoder using an algorithm of the present invention herein referred to as the xe2x80x9cPAWS algorithmxe2x80x9d to determine and log the location and number of opportunities that exist for inserting a watermark into the audio signal such that it will be masked by the PEE of the audio signal. The user can adjust certain parameters of the PAWS algorithm before the audio file is processed. A/B/X testing between the original and watermarked files is also supported to allow the user to undo or re-encode the watermark, if desired.
In particular, the encoder divides the frequency spectrum into seven xe2x80x9ccritical bandsxe2x80x9d, each of which includes two carrier frequencies for representing logic 0 and logic 1, respectively. The basic encoding process is as follows. First, the user sets up the desired parameters for the algorithm, including selecting which critical bands are to be active, specifying, in dB, the desired xe2x80x9cheadroomxe2x80x9d between the PEE of the audio signal and the amplitude of the encoded watermark signal transmitted in each active band, and specifying the maximum time between transmissions of the encoded watermark signal.
If the encoding is not being performed in real-time, the user executes a preconditioning phase. During preconditioning, the encoder runs through the entire .WAV file and logs watermark opportunities according to the PAWS algorithm and the parameters specified by the user. In addition, the encoder detects xe2x80x9cnear-missxe2x80x9d opportunities in the audio signal; that is, points in the audio signal that would constitute opportunities with a small adjustment to the gain. The encoder adjusts the gain of the audio signal at that point to create an opportunity therefrom. The preconditioned audio signal is written back to a .WAV file.
In a preferred embodiment, the watermark is formatted as a frame of 32 characters. During operation, the original or preconditioned .WAV file is input to the encoder, which monitors each active critical band of the audio signal to detect opportunities for inserting watermark data in accordance with the PEE of the signal within the band, as well as the user-defined parameters. The existence and location of each opportunity is logged and the encoder determines how many bytes of the watermark word (a xe2x80x9csubframexe2x80x9d) may be transmitted during that opportunity, according to the data rate of that band, by measuring the width of an opportunity and dividing by the data rate, which yields the size of the data transmission. The encoder encodes the watermark using Gaussian Minimal Shift Key (xe2x80x9cGMSKxe2x80x9d) modulation and incorporates the encoded subframes of the watermark data block into the audio signal at the opportunity.
In one aspect, at each opportunity, a timer is reset to a maximum time between opportunities, which is either a default value or a value selected by a user. If the timer times out before the next opportunity is detected, the encoder xe2x80x9cforcesxe2x80x9d an opportunity by cross-fading in an 18 kHz low pass filter (xe2x80x9cLPFxe2x80x9d) to clean out the band above 18 kHz, transmitting the watermark signal using GMSK modulation at carrier frequencies 18.5 kHz (for logic 0) and 19.5 kHz (for logic 1) and a data rate of 1200 bps, and then cross-fading out the LPF.
In the preferred embodiment, each portion of watermark data to be inserted at a given opportunity is preceded by a 4-bit preamble. In addition to the four preamble bits, additional bits must be allocated in each subframe to indicate which piece of the overall watermark the present burst carries. If the seven bands are used, there are a minimum of 16 bits per transmission. Therefore, four more bits may be used to indicate which character the present character is and there are at least eight bits left over to carry actual watermark data. If a higher frequency band carries more than 16 bits, then the preamble indicates the index of the first character of the transmission.
Alternatively, rather than using a 4-bit index preamble bit, one preamble could be assigned to indicate the start of a frame and another assigned to the rest of the frame, in which case 12 bits of each transmission would be left for carrying data.
In any event, each subframe of watermark data is modulated using GMSK modulation centered at the geometric mean of the two carrier frequencies within the band and mixed with the audio signal at a level defmed by the user (xe2x80x9cheadroomxe2x80x9d). The resultant watermarked audio signal is stored in a file in memory.
Information concerning the total number of opportunities and the average and maximum time between them is made available to the user so that he or she can determine how well the current settings for the algorithm parameters performed. At this point, the user may wish to change some of the parameters, for example, if the average time between transmissions is too great or the total number of opportunities is too small.
Once the audio file has been processed, the user can audition the original .WAV file against the watermarked audio file. A conventional .WAV viewer window is provided for this purpose, with controls for advancing to the next or previous watermark position and for auditioning the original (xe2x80x9cAxe2x80x9d), watermarked (xe2x80x9cBxe2x80x9d), or unknown random (xe2x80x9cXxe2x80x9d) version, which allows a user to listen to the original or watermarked version without knowing which version they are listening to, thereby eliminating any personal bias that might affect the user""s perception of the watermark. During the auditioning phase, the user may amplify or attenuate the level of each watermark instance via a level control with a range of +/xe2x88x9220 dB. This level will be applied to that instance of the watermark during the next run of the encoder.
Once the user has auditioned the watermarked file, the file can be saved in any one of a number of known formats. The encoding process is now complete.
On the decoding end, a decoder decodes the watermark from the watermarked signal using GMSK demodulation. The result of the GMSK demodulation is, for each band, a xe2x80x9crandomxe2x80x9d stream of 0""s and 1""s.
The watermark signal is detected from the data stream output each of the GMSK demodulators as follows. First, the data stream is sampled at a particular sample rate xe2x80x9cFsxe2x80x9d. If the baud rate (xe2x80x9cFbxe2x80x9d) is related to the sample rate by a known ratio (xe2x80x9cRxe2x80x9d), e.g., R=Fs/Fb, then the output from the GMSK demodulator can be routed through a sliding window of width R and observed to detect all 1""s or all 0""s, indicating what appears to be a valid bit. Using four of these sliding comparators, the full preamble can be detected, thus indicating the start of a transmission of a watermark subframe.
Once a preamble has been detected, a comparator of width R is used to detect each bit of the subframe. If anything but all 0""s or 1""s is detected in each bit cell, the whole subframe is discarded, since there was either a faulty preamble detection (e.g., it was really audio information that looked like the preamble) or the signal was negatively impacted by noise during transmission. If Rxe2x88x921 or R+1 0""s or 1""s are detected, the sample rate might be off by a fraction, so the discrepancy is ignored and the bit counter is reset upon the next state change.
In one embodiment, the entire watermark is sent once, with the various subframes transmitted in the various active critical bands, such that a portion of the watermark may be sent in each of the active bands, thereby increasing the number of opportunities for inserting the watermark. In another embodiment, the entire watermark is inserted in each of the bands, such that the watermark appears seven times in the watermarked audio signal (assuming all of the bands are designated as active), thereby providing redundancy.
A technical advantage achieved with the invention is that it is capable of xe2x80x9cforcingxe2x80x9d an opportunity if no opportunities have been detected for a predefined period of time, thereby to ensure that all of the watermark data is transmitted.
A further technical advantage achieved with the invention is that it operates in seven critical bands, thereby providing increased opportunities for including the watermark data and the option for redundancy, where desirable.
Another technical advantage achieved with the invention is that the audio signal can be preconditioned such that if a xe2x80x9cnear-opportunityxe2x80x9d is detected, a filter can be used to change the frequency response of the system to create an opportunity.