Digital distribution of multimedia content (audio, video, etc.) and the impending convergence of industries that seek to make this goal a reality (computer, telecommunications, media, electric power, etc.) collide with the simplicity of making perfect digital copies. There exists a vacuum in which content creators resist shifts to full digital distribution systems for their digitized works, due to the lack of a means to protect the copyrights of these works. In order to make such copyright protection possible, there must exist a mechanism to differentiate between a master and any of its derivative copies. The advent of digital watermarks makes such differentiation possible. With differentiation, assigning responsibility for copies as they are distributed can assist in the support and protection of underlying copyrights and other “neighboring rights,” as well as, the implementation of secure metering, marketing, and other as yet still undecided applications. Schemes that promote encryption, cryptographic containers, closed systems, and the like attempt to shift control of copyrights from their owners to third parties, requiring escrow of masters and payment for analysis of suspect, pirated copies. A frame-based, master-independent, multi-channel watermark system is disclosed in U.S. patent application Ser. No. 08/489,172 filed on Jun. 7, 1995 and entitled “STEGANOGRAPHIC METHOD AND DEVICE”, U.S. patent application Ser. No. 08/587,944 filed on Jan. 17, 1996 and entitled “METHOD FOR HUMAN-ASSISTED RANDOM KEY GENERATION AND APPLICATION FOR DIGITAL WATERMARK SYSTEM”, and U.S. patent application Ser. No. 08/587,943 filed on Jan. 16, 1996 and entitled “METHOD FOR STEGA-CIPHER PROTECTION OF COMPUTER CODE”. These applications describe methods by which copyright holders can watermark and maintain control over their own content. Any suspect copies carry all necessary copyright or other “rights” information within the digitized signal and possession of an authorized “key” and the software (or even hardware) described in these applications would make determination of ownership or other important issues a simple operation for the rights holder or enforcer.
Optimizing watermark insertion into a given signal is further described in the U.S. patent application Ser. No. 08/677,435 filed on Jul. 2, 1996 and entitled “OPTIMIZATION METHODS FOR THE INSERTION, PROJECTION AND DETECTION OF DIGITAL WATERMARKS IN DIGITIZED DATA”. This application discloses accounting for the wide range of digitally-sampled signals including audio, video, and derivations thereof that may constitute a “multimedia” signal. The optimization techniques described in that application take into account the two components of all digitization systems: error coding and digital filters. The premise is to provide a better framework or definition of the actual “aesthetic” that comprises the signal being reproduced, whether through commercial standards of output (NTSC, CD-quality audio, etc.) or lossless and lossy compression (MPEG-2, Perceptual Audio Coding, AC-3, Linear Adaptive Coding, and the like), so that a watermark may be targeted at precisely the part of the signal comprising such an “aesthetic” in order that it be as robust as possible (i.e., difficult to remove without damaging the perceptual quality of the signal). However the content is stored, the signal still carries the digital watermark. Additionally, transmission media may be characterized as a set of “filters” that may be pre-analyzed to determine the best “areas” of the signal in which watermarks “should” be encoded, to preserve watermarks in derivative copies and ensure maximum destruction of the main, carrier signal when attempts are made to erase or alter the watermarked content.
Optimal planning of digital watermark insertion can be based on the inversion of digital filters to establish or map areas comprising a given content signal's “insertion envelope.” That is, the results of the filter operation are considered in order to “back out” a solution. In the context of this discussion, the phrase “inverting” a filter may mean, alternatively, mathematical inversion, or the normal computation of the filter to observe what its effect would be, were that filter applied at a later time. Planning operations will vary for given digitized content: audio, video, multimedia, etc. Planning will also vary depending on where a given “watermarker” is in the distribution chain and what particular information needs that user has in encoding a given set of information fields into the underlying content. The disclosures described take into account discrete-time signal processing which can be accomplished with Fast Fourier Transforms that are well-known in the art of digital signal processing. Signal characteristics are also deemed important: a specific method for analysis of such characteristics and subsequent digital watermarking is disclosed in further detail in this application. The antecedents of the present invention cover time and frequency domain processing, which can be used to examine signal characteristics and make modifications to the signal. A third way would be to process with z-transforms that can establish signal characteristics in a very precise manner over discrete instances of time. In particular, z-transform calculations can be used to separate the deterministic, or readily predictable, components of a signal from the non-deterministic (unpredictable or random) components. It should be apparent to those skilled in the art that non-deterministic is a subjective term whose interpretation is implicitly affected by processing power, memory, and time restrictions. With unlimited DSP (digital signal processing) power, memory, and time to process, we might theoretically predict every component of a signal. However, practicality imposes limitations. The results of the z-transform calculations will yield an estimator of the signal in the form of a deterministic approximation. The difference between a signal reconstituted from the deterministic estimator and the real signal can be referred to as error, and the error in an estimator can be further analyzed for statistical characteristics. Those skilled in the art will be aware that Linear Predictive Coding (LPC) techniques make use of these properties. So the error can be modeled, but is difficult to reproduce exactly from compressed representations. In essence, this error represents the randomness in a signal which is hard to compress or reproduce, but in fact may contribute significantly to the gestalt perception of the signal.
The more elements of error determined with z-transforms, the better able a party is at determining just what parts of a given carrier signal are deterministic, and thus predictable, and what elements are random. The less predictable the watermark-bearing portion of a signal is and the more it contributes to the perception of the signal, as previously disclosed, the more secure a digital watermark can be made. Z-transform analysis would disclose just which phase components are deterministic and which are random. This is because it is difficult to compress or otherwise remove unpredictable signal components. Error analysis further describes the existence of error function components and would reliably predict what signals or data may later be removed by additional z-transform analysis or other compression techniques. In effect, the error analysis indicates how good an approximation can be made, another way of stating how predictable a signal is, and by implication, how much randomness it contains. Z-transforms are thus a specialized means to optimize watermark insertion and maximize the resulting security of encoded data from attempts at tampering. The results of a Z-transform of input samples could be analyzed to see “exactly” how they approximate the signal, and how much room there is for encoding watermarks in a manner that they will not be removed by compression techniques which preserve a high degree of reproduction quality.
Time is typically described as a single independent variable in signal processing operations but in many cases operations can be generalized to multidimensional or multichannel signals. Analog signals are defined continuously over time, while digital signals are sampled at discrete time intervals to provide a relatively compact function, suitable for storage on a CD, for instance, defined only at regularly demarcated intervals of time. The accrued variables over time provide a discrete-time signal that is an approximation of the actual non-discrete analog signal. This discreteness is the basis of a digital signal. If time is unbounded and the signal comprises all possible values, a continuous-valued signal results. The method for converting a continuous-valued signal into a discrete time value is known as sampling. Sampling requires quantization and quantization implies error. Quantization and sampling are thus an approximation process.
Discreteness is typically established in order to perform digital signal processing. The issue of deterministic versus random signals is based on the ability to mathematically predict output values of a signal function at a specific time given a certain number of previous outputs of the function. These predictions are the basis of functions that can replicate a given signal for reproduction purposes. When such predictions are mathematically too complicated or are not reasonably accurate, statistical techniques may be used to describe the probabalistic characteristics of the signal. In many real world applications, however, determinations of whether a signal, or part of a signal, is indeed random or not is difficult at best. The watermark systems described in earlier disclosures mentioned above have a basis in analyzing signals so that analysis of discrete time frames can be made to insert information into the signal being watermarked. When signal characteristics are measured, a key factor in securely encoding digital watermarks is the ability to encode data into a carrier signal in a way that mimics randomness or pseudo randomness so that unauthorized attempts at erasing the watermark necessarily require damage to the content signal. Any randomness that exists as a part of the signal, however, should be estimated in order that a party seeking to optimally watermark the input signal can determine the best location for watermark information and to make any subsequent analysis to determine the location of said watermarks more difficult. Again, typical implementations of signal processing that use z-transforms seek to describe what parts of the signal are deterministic so that they may be described as a compact, predictable function so that the signal maybe faithfully reproduced. This is the basis for so-called linear predictive coding techniques used for compression. The present invention is concerned with descriptions of the signal to better define just what parts of the signal are random so that digital watermarks may be inserted in a manner that would make them more or less tamperproof without damage to the carrier signal. Additional goals of the system are dynamic analysis of a signal at discrete time intervals so that watermarks may be dynamically adjusted to the needs of users in such instances as on-the-fly encoding of watermarks or distribution via transmission media (telephone, cable, electric powerlines, wireless, etc.)
Signal characteristics, if they can be reasonably defined, are also important clues as to what portion or portions of a given signal comprise the “aesthetically valuable” output signal commonly known as music or video. As such, perceptual coding or linear predictive coding is a means to accurately reproduce a signal, with significant compression, in a manner that perfectly replicates the original signal (lossless compression) or nearly replicates the signal (lossy compression). One tool to make better evaluations of the underlying signal includes the class of linear time-invariant (LTI) systems. As pointed out in Digital Signal Processing (Principles, Algorithms, and Applications), 3rd Ed. (Proakis and Manolakis), (also Practical DSP Modeling, Techniques, and Programming in C by Don Morgan) the z-transform makes possible analysis of a continuous-time signal in the same manner as discrete-time signals because of the relationship between “the convolution of two time domain signals is equivalent to multiplication of their corresponding z-transforms.” It should be clear that characterization and analysis of LTI systems is useful in digital signal processing; meaning DSP can use a z-transform and invert the z-transform to deterministically summarize and recreate a signal's time domain representation. Z-transforms can thus be used as a mathematical way in which to describe a signal's time domain representation where that signal may not be readily processed by means of a Fourier transform. A goal of the present invention is to use such analysis so as to describe optimal locations for watermarks in signals which typically have components both of deterministic and non-deterministic (predictable and unpredictable, respectively) nature. Such insertion would inherently benefit a system seeking to insert digital watermarks, that contain sensitive information such as copyrights, distribution agreements, marketing information, bandwidth rights, more general “neighboring rights,” and the like, in locations in the signal which are not easily accessible to unauthorized parties and which cannot be removed without damaging the signal. Such a technique for determining watermark location will help ensure “pirates” must damage the content in attempts at removal, the price paid without a legitimate “key.”
Some discussion of proposed systems for a frequency-based encoding of “digital watermarks” is necessary to differentiate the antecedents of the present invention which processes signals frame-by-frame and may insert information into frequencies without requiring the resulting watermark to be continuous throughout the entire clip of the signal. U.S. Pat. No. 5,319,735 to Preuss et al. discusses a spread spectrum method that would allow for jamming via overencoding of a “watermarked” frequency range and is severely limited in the amount of data that can be encoded—4.3 8-bit symbols per second. Randomization attacks will not result in audible artifacts in the carrier signal, or degradation of the content as the information signal is subaudible due to frequency masking. Decoding can be broken by a slight change in the playback speed. It is important to note the difference in application between spread spectrum in military field use for protection of real-time radio signals versus encoding information into static audio files. In the protection of real-time communications, spread spectrum has anti-jam features since information is sent over several channels at once, and in order to jam the signal, you have to jam all channels, including your own. In a static audio file, however, an attacker has all the time and processing power in the world to randomize each sub-channel in the signaling band with no penalty to themselves, so the anti-jam features of spread spectrum do not extend to this domain if the encoding is sub-audible. Choosing where to encode in a super-audible range of the frequency, as is possible with the present invention's antecedents, can better be accomplished by computing the z-transforms of the underlying content signal, in order to ascertain the suitability of particular locations in the signal for watermark information.
Instead of putting a single subaudible, digital signature in a sub-band as is further proposed by such entities as NEC, IBM, Digimarc, and MIT Media Lab, the antecedent inventions' improvement is its emphasis on frame-based encoding that can result in the decoding of watermarks from clips of the original full signal (10 seconds, say, of a 3 minute song). With signatures described in MIT's PixelTag or Digimarc/NEC proposals, clipping of the “carrier signal” (presently only based on results from tests on images, not video or audio signals which have time domains), results in clipping of the underlying watermark. Additionally, the present invention improves on previous implementations by providing an alternative computational medium to time/amplitude or frequency/energy domain (Fourier Transform) calculations and providing an additional measure by which to distinguish parts of a signal which are better suited to preserve watermarks through various DSP operations and force damage when attempts at erasure of the watermarks are undertaken. Further, the necessity of archiving or putting in escrow a master copy for comparison with suspect derivative copies would be unnecessary with the present invention and its proposed antecedents. Further, statistical techniques, not mathematical formulas, that are used to determine a “match” of a clip of a carrier signal to the original signal, both uneconomical and unreasonable, would not be necessary to establish ownership or other information about the suspect clip. Even if such techniques or stochastic processes are used, as in an audio spread-spectrum-based watermarking system being proposed by Thorn-EMI's CRL, called ICE, the further inability to decode a text file or other similar file that has been encoded using a watermark system as previously disclosed by above-mentioned U.S. patent applications including “Steganographic Method and Device”, “Method for Human-Assisted Random Key Generation and Application for Digital Watermark System”, “Method for Stega-cipher Protection of Computer Code”, and “Optimal Methods for the insertion, Protection and Detection of Digital Watermarks in Digitized Data”, where all “watermark information” resides in the derivative copy of a carrier signal and its clips (if there has been clipping), would seem archaic and fail to suit the needs of artists, content creators, broadcasters, distributors, and their agents. Indeed, reports are that decoding untampered watermarks with ICE in an audio file experience “statistical” error rates as high as 40%. This is a poor form of “authentication” and fails to establish more clearly “rights” or ownership over a given derivative copy. Human listening tests would appear a better means of authentication versus such “probabalistic determination”. This would be especially true if such systems contain no provision to prevent purely random false-positive results, as is probable, with “spread spectrum” or similar “embedded signaling”—type “watermarks,” or actually, with a better definition, frequency-based, digital signatures.