The general field of application of the invention involves improved techniques for the encoding of digital information into audio, image, and video media files, volumetric data files, 2-D and 3-D spline and other data files and the like, the invention being more particularly, though not exclusively, directed to enabling large sequences of data, as distinguished from relatively short sequences, as in encoding simple copyright or ownership or related limited information into such media files, to be embedded seamlessly and flexibly, particularly into compressed audio, image, video, 3-D and other media files and the like, and with the techniques being also useful with other, types of compressed data files and formats, as well.
Data has heretofore often been embedded in analog representations of media information and formats. This has been extensively used, for example, in television and radio applications as for the transmission of supplemental data, such as text, but the techniques used are not generally capable of transmitting high bit rates of digital data.
Watermarking data has also been embedded so as to be robust to degradation and manipulation of the media. Typical watermarking techniques rely on gross characteristics of the signal being preserved through common types of transformations applied to a media file. These techniques are again limited to fairly low bit rates. Good bit rates on audio watermarking techniques are, indeed, only around a couple of dozen bits of data encoded per second.
While data has been embedded in the low-bit of the signal-domain of digital media enabling use of high bit rates, such data is either uncompressed, or capable of only relatively low compression rates. Many modern compressed file formats, moreover, do not use such signal-domain representations and are thus unsuited to the use of this technique. Additionally, this technique tends to introduce audible noise when used to encode data in sound files.
Among prior patents illustrative of such and related techniques and uses are U.S. Pat. No. 4,379,947 (dealing with the transmitting of data simultaneously with audio), U.S. Pat. No. 5,185,800 (using bit allocation for transformed digital audio broadcasting signals with adaptive quantization based on psychoauditive criteria ), U.S. Pat. No. 5,687,236 (steganographic techniques), U.S. Pat. No. 5,710,834 (code signals conveyed through graphic images), U.S. Pat. No. 5,832,119 (controlling systems by control signals embedded in empirical data); U.S. Pat. No. 5,850,481 (embedded documents, but not for arbitrary data or computer code), U.S. Pat. No. 5,889,868 (digital watermarks in digital data), and U.S. Pat. No. 5,893,067 (echo data hiding in audio signals).
Prior publications relating to such techniques include
Bender, W. D. Gruhl, M. Morimoto, and A. Lu, xe2x80x9cTechniques for data hidingxe2x80x9d, IBM Systems Journal, Vol. 35, Nos. 3 and 4, 1996, p. 313-336;
MPEG Spec-ISO/IEC 11172, part 1-3, Information Technology-Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s Copyright 1993, ISO/IEC; and
A survey of techniques for multimedia data labeling, and particularly for copyright labeling using watermark in the encoding low bit-rate information is presented by Langelaar, G. C. et al. in xe2x80x9cCopy Protection For Multimedia Data based on Labeling Techniquesxe2x80x9d.
In specific connection with the above-cited xe2x80x9cMPEG Specxe2x80x9d and xe2x80x9cID3v2 Specxe2x80x9d reference applications, we have disclosed in co-pending U.S. patent application Ser. No. 09/389,942, filed Sep. 3, 1999, entitled xe2x80x9cProcess Of And System For Seamlessly Embedding Executable Program Code Into Media File Formats Such As MP3 And The Like For Execution By Digital Media Player And Viewing Systemsxe2x80x9d, techniques applying some of the embedding concepts of the present invention, though directed specifically to imbuing one or more of pre-prepared audio, video, still image, 3-D or other generally uncompressed media formats with an extended capability to supplement their pre-pared presentations with added graphic interactive and/or e-commerce content presentations at the digital media playback apparatus.
As earlier indicated, however, the present invention is more broadly concerned with data embedding in compressed formats, and with encoding a frequency representation of the data, typically through a Fourier Transform, Discrete Cosine Transform, wavelet transform or other well-known function. The invention embeds high-rate data in compressed digital representations of the media, including through modifying the low-bits of the coefficients of the frequency representation of the compressed data, thereby enabling additional benefits of fast encoding and decoding, because the coefficients of the compressed media can be directly transformed without a lengthy additional decompression/compression process. The technique of the present invention also can be used in combination with watermarking, but with the watermark applied before the data encoding process.
The earlier cited Langelaar et al publication, in turn, references and discusses the following additional prior art publications:
J. Zhao, E. Koch: xe2x80x9cEmbedding Robust Labels into Images for Copyright Protectionxe2x80x9d, Proceedings of the International Congress on Intellectual Property Rights for Specialized Information, Knowledge and New Technologies, Vienna, Austria, August 1995;
E. Koch, J. Zhao: xe2x80x9cTowards Robust and Hidden Image Copyright Labelingxe2x80x9d, Proceedings IEEE Workshop on Nonlinear Signal and Image Processing, Neos Marmaras, June, 1995. and
F. M. Boland, J. J. K O Ruanaidh, C, Dautzenberg: xe2x80x9cWatermarking Digital Images for Copyright Protectionxe2x80x9d, Proceedings of the 5th International Conference on Image Processing and its Applications, No. 410, Endinburgh, July, 1995
An additional article by Langelaar also discloses earlier labeling of MPEG compressed video formats:
G. C Langelaar, R. L. Lagendijk, J. Biemond: xe2x80x9cReal-time Labeling Methods for MPEG Compressed Video,xe2x80x9d 18th Symposium on Information Theory in the Benelux, May 15-16, 1997, Veldhoven, The Netherlands.
These Zhao and Koch, Boland et al and Langelaar et al disclosures, while teaching encoding technique approaches having partial similitude to components of the techniques employed by the present invention, as will now be more fully explained, are not, however, either anticipatory of, or actually adapted for solving the total problems with the desired advantages that are addressed and sought by the present invention.
Considering, first, the approach of Zhao and Koch, above-referenced, they embed a signal in an image by using JPEG-based techniques. ([JPEG] Digital Compression and Coding of Continuous-tone Still Images, Part 1: Requirements and guidelines, ISO/IEC DIS 10918-1). They first encode a signal in the ordering of the size of three coefficients, chosen from the middle frequency range of the coefficients in an 8-by-8 block DCT. They divide nine permutations of the ordering relationship among these three coefficients into three groups: one encoding a xe2x80x981xe2x80x99 bit (HML, MHL, and HHL), one encoding a xe2x80x980xe2x80x99 bit (MLH, LMH, and LLH), and a third group encoding xe2x80x9cno dataxe2x80x9d (HLM, LHM, and MMM). They have also extended this technique to the watermarking of video data. While their technique is robust and resilient to modifications, they cannot, however, encode large quantities of data, since they can only modify blocks where the data is already close to the data being encoded, otherwise, they must modify the coefficients to encode xe2x80x9cno dataxe2x80x9d. They must also severely modify the data since they must change largexe2x80x94scale ordering relationships of coefficients. As will later more fully be explained, these are disadvantages overcome by the present invention through its technique of encoding data by changing only a single bit in a coefficient.
As for Boland, Ruanaidh, and Dautzenberg, they use a technique of generating the DCT Walsh Transform, or Wavelet Transform of an image, and then adding one to a selected coefficient to encode a xe2x80x9c1xe2x80x9d bit, or subtracting one from a selected coefficient to encode a xe2x80x9c0xe2x80x9d bit. This technique, although at first blush somewhat superficially similar in one aspect of one component of the present invention, has the very significant limitation, obviated by the present invention, that information can only be extracted by comparing the encoded image with the original image. This means that a watermarked and a non-watermarked copy of any media file must be sent simultaneously for the watermarking to work. This is a rather severe limitation, overcome in the present invention by the novel incorporation of the use of the least-significant bit encoding technique.
Such least-significant bit encoding broadly has, however, been earlier proposed, but not as implemented in the present invention. The Langelaar, Langendijk, and Biemond publication, for example, teaches a technique which encodes data in MPEG video streams by modifying the least significant bit of a variable-length code (VLC) representing DCT coefficients. Langelaar et al""s encoding keeps the length of the file constant by allowing the replacement of only those VLC values which can be replaced by another value of the same length and which have a magnitude difference of one. The encoding simply traverses the file and modifies all suitable VLC values. Drawbacks of their techniques, however, are that suitable VLC values are relatively rare (167 per second in a 1.4 Mbit/sec video file, thus allowing only 167 bits to be encoded in 1.4 million bits of information).
In comparison, the technique of the present invention as applied for video, removes such limitation and can achieve much higher bit-rates while keeping file-length constant, by allowing a group or set of nearby coefficients to be modified together. This also allows for much higher quantities of information to be stored without perceptual impact because it allows for psycho-perceptual models to determine the choice of coefficients to be modified.
The improved techniques of the present invention, indeed, unlike the prior art, allow for the encoding of digital information into an audio, image, or video file at rates several orders of magnitude higher than those previously described in the literature (order of 300 bits per second ). As will later be disclosed, the present invention, indeed, has easily embedded a 3000 bit/second data stream in a 128,000 bit/second audio file.
In the prior art, only relatively short sequences of data have been embedded into the media file, typically encoding simple copyright or ownership information. Our techniques allow for media files to contain entirely new classes of content, such as: entire computer programs, multimedia annotations, or lengthy supplemental communications. As described in said copending application, computer programs embedded in media files allow for expanded integrated transactional media of all kinds, including merchandising, interactive content, interactive and traditional advertising, polls, e-commerce solicitations such as CD or concert ticket purchases, and fully reactive content such as games and interactive music videos which react to the user""s mouse motions and are synced to the beat of the music. This enables point of purchase sales integrated with the music on such software and hardware platforms as the television, portable devices like the Sony Walkman, the Nintendo Game Boy, and portable MP3 players such as the Rio and Nomad and the like. This invention even creates new business models. For example, instead of a record company trying to stop the copying of its songs, it might instead encourage the free and open distribution of the music, so that the embedded advertising and e-commerce messages are spread to the largest possible audience of potential customers.
It is accordingly a primary object of the present invention to provide a new and improved process, system and apparatus for embedding data in compressed audio, image, video and other media files and the like that shall not be subject to the limitations and disadvantages of the prior art as above discussed, but that, to the contrary, seamlessly and facilely enables large sequences of data to be embedded into such compressed data media files, enabling adding new classes of content including, but by no means limited to, entire computer programs, multi-media annotations and lengthy supplemental communications, among other supplemental contents.
A further object is to provide such a novel process in which digital watermarking may also be used, but with the watermark applied before the data encoding process.
Still another object is to provide such a novel embedding technique that is more generally and generically applicable, as well, including for volumetric data files, 2-D and 3-D spline datapoint files, and other data files.
Other and further objects will be explained hereinafter and are more particularly pointed out in the appended claims.
In summary, therefore, from one of its broader aspects, the invention embraces a process for embedding supplemental digital data into a pre-prepared compressed digital media file, that comprises, encoding the compressed digital media file as a set of coefficient representations of the pre-prepared media file information, and embedding portions of the supplemental digital data at selected coefficients to produce a media file containing such embedded data for enabling user decoding and playback of both the pre-prepared media file information and the embedded supplemental data.
Preferred and best mode embodiments, designs and techniques are later presented in detail.