1. Field of the Invention
The present invention relates generally to the field of digital steganography. Particularly, the present invention relates to a process for embedding digital information into a lossy compression medium.
2. Description of the Prior Art
Digital steganography is the art of inconspicuously hiding data within data. Simply put, it is a digital watermark. Steganography's goal in general is to hide data well enough that unintended recipients do not suspect the steganographic medium of containing hidden data. Steganography and data hiding are not new concepts. It is believed that steganography was first practiced during the Golden Age in Greece. An ancient Greek record describes the practice of melting wax off wax tablets used for writing messages and then inscribing a message in the underlying wood. The wax was then reapplied to the wood, giving the appearance of a new, unused tablet. The resulting unused tablets could be innocently transported without anyone suspecting the presence of a message beneath the wax.
Steganography should not be confused with encryption. They are not the same and, therefore, they are used to achieve separate goals. Encryption is used to encode data such that an unintended recipient cannot determine its intended meaning. The purpose of steganography is not to keep others from knowing the hidden information. Steganography does not alter data to make it unusable to an unintended recipient. It is to keep others from thinking that the information even exists.
The amount of data that can be effectively hidden in a given medium tends to be restricted by the size of the medium itself. Ordering data that does not have an ordering constraint is often an effective method of steganography. Each permutation of a set of objects can be mapped to a positive integer. This mapping can then be used to encode hidden data by altering the order of objects that are not considered ordered by the carrier medium. While this technique generally does not change the information quality, hidden data can easily be lost if the medium is encoded again. More simply, this encoding of data is a digital watermark.
Typically, the watermark data is used to show authenticity or ownership of the other data, but it may also be a hidden message unrelated to the underlying carrier media. Digital audio watermarks, for example, are typically used to authenticate audio data copyright and distribution ownership. To preserve audio fidelity, such watermarks should be invisible to inspection, inaudible to the ear, robust against attacks and transformations, and easily provable. Embedded identifying watermarks may also be used to track patterns of distribution of media over electronic networks.
Prior art digital watermarking techniques can be sorted into categories from weakest to strongest. The weakest are non-audio watermarks. An example of this type of watermark is the use of data frames in an mp3 file, which do not affect the audio but are easily removed. The next type is classified as fragile watermarks. These use techniques like least-significant-bit (LSB) alteration, which is easily perturbed by decoding/recoding. Noise-based watermarks are stronger than the fragile watermarks. These use pseudorandom noise patterns to embed data, which require probabilistic recognition. The next stronger category are key-based or algorithmic watermarks. These types of watermarks rely on a key-path or algorithmic path to map out the locations and/or order of embedded information. The embedded information is unrecoverable without knowing the key or easily broken unless the algorithm is known. The strongest category to date is the spread-spectrum watermark. This category hides the data in many seemingly random frequency bins, which is just a specialized mark-construction technique that may be used with a key or algorithm.
There are problems with these methods. Key management can become very complex, especially if a unique key must be created for every individual marked file. Keeping track of the keys is not only difficult, but also finding the right key for a particular file without external information can be problematic. Probabilistic proof/detection for recognition of watermark data lacks certainty, i.e. probabilistic detectors can be wrong. They may introduce significant noise into the media. Many of these methods are already cracked or bypassed in the market and thus rendered useless for practical purposes. None of these methods are robust against all common attacks and transformations, especially cycles of decompression and recompression.
Several techniques have been devised for digital watermarks in audio applications. U.S. Pat. No. 6,675,146 (2004, Rhoads) discloses an audio steganographic method. The process uses two carrier bands to encode watermarking data in an audio file and to change sample values in the file. U.S. Pat. No. 6,571,144 (2003, Moses et al.) discloses a system for providing a digital watermark in an audio signal. The process describes using seven critical bands, each of which contains two carrier frequencies for 0 and 1 encoding using modulated noise. Neither the Rhoads process nor the Moses et al. process would survive lossy compression.
The strongest watermarking techniques currently used for digital-rights management applications are both noise-based and probabilistic. They also require altered hardware and/or software to use the marked files. This limits control over the exposure of the precise coding mechanism being used, which may subject the coding mechanism to reverse-engineering and eventual defeat. As an identification method, this form of digital-rights management suffers from its lack of certainty in recognition.
Therefore, what is needed is a steganographic system that is difficult to bypass even if the method is known. What is further needed is a steganographic system that is robust against common attacks and transformations, including decompression and recompression. What is also needed is a steganographic system that introduces minimal noise into the media.