The invention is in the field of marking binary coded data sets, particularly concerning image data and audio signals by embedding at least one information unit in a data set.
Protection of intellectual property rights and usage rights is one of the main concerns of producers of information, publishers, media companies and usage rights brokers. One of the paramount concerns of the creators of intellectual property is protection from illicit copying and distribution of copyrighted multimedia data. In many cases, as for instance in the case of the distribution of digitally stored data the direct inhibition of illicit distribution is not possible. To counter this, techniques have been developed which allow the detection and tracking of illicit copying and distribution as well as tracing the original perpetrator.
Such measures for embedding information in the particular electronic data sets constitute so-called steganographic measures which allow the integration of additional secret information in data by modifying the original data without significantly degrading the quality or appearance of the original data.
A number of different methods for slightly modifying original digital data in order to embed additional information have been investigated. Matsui and Tanaka have proposed a number of steganographic methods for identifying intellectual property in digital images, videos and facsimiles. Refer to the publication of Matsui and K. Tanaka xe2x80x9cVideo-Steganography: How to secretly embed a signature in a picturexe2x80x9d, IMA Intellectual Property Proceedings Vol. 1, No. 1, 1994 for details.
The underlying principle of their methods is based on embedding the information in such a way as to make it appear that it is merely a further imprecision (i.e. an increase in the noise level) in the original data.
Researchers at ATandT have investigated possibilities for embedding information in another class of documents, namely structured text, by means of inserting distortions in the form of varying in a controlled fashion the spacing between subsequent lines as well as the inter-word spacing. Refer to the publication by J. Brassil et al. xe2x80x9cElectronic Marking and Identification Techniques to Discourage Document Copyingxe2x80x9d, ATandT Bell Laboratories, Murray Hill, N.J., 1994 for details.
The existing steganographic techniques for digital imagery do not by any measure meet the requirements for the protection and proof of intellectual property rights to multimedia productions and information. This is due to the fact that the techniques do not offer any protection against both intentional and inadvertent attacks which are possible by digital processing especially of image data. The deletion, modification or gross distortion of the embedded information may easily be accomplished by a lossy image compression, low pass filtering and/or change of the data/image representation in the case of the older techniques. However, such processing steps are often performed by necessity on the way of a digital multimedia document from the creator to the final consumer or user; hence the mentioned techniques are unsuited for yielding a proof of authenticity or for identification of the intellectual rights holder.
The European publication EP 0 581 317 contains a method for digital marking of digital data sets, e.g. digital image data. The method described in this publication allows the integration of digital signatures, a.k.a. markings in digital image data. The digital signatures are embedded to allow the future identification of the images. This is accomplished by identifying pixels of the image which contain relative minima and maxima (i.e. extreme values) in the pixel values. From the thusly identified pixels, pixels are determined for the integration of an identification code, the so-called signature. In order to integrate each bit of the signature at a selected point of the image, the pixel values of the selected pixel as well as the pixel values of the surrounding pixels are adapted, i.e. modified. This technique has the Immanent drawback that the selected positions for the integration of the signature values are easily determined and may therefore be assumed as known to attackers. By selecting the positions of the originally present extreme values the thus integrated identification codes may easily be detected and removed accordingly.
Similarly, the European Publication EP 0 614 308 A1 contains a method for encrypting data. Such encryption techniques accomplish the complete scrambling of the entire data set for unauthorized access. In this technique, high resolution image components are protected from illicit access by means of a key or an encryption technique. Non-high-resolution image components may however remain freely accessible so that a hierarchical access control for the information is realized. The entirety of the image information may be present in one storage medium while only authorized users may access all of the image data present. The above mentioned European publication does not serve to accomplish a directed marking of e.g. image information for a future identification; it merely serves to encrypt the entire information content of an image so that unauthorized users do not gain access to the information.
The invention comprises a procedure for marking binary coded data sets, particularly but not limited to image data and audio signals by embedding at least one information unit in a data set in such a way as to embed information serving to uniquely identify the data set to be protected so that the relationship between the data set and the embedded information is not lost in a number of modifications of the data set. The identification procedure shall offer the creator as well as the customers and distributors of multimedia creations the opportunity to confirm and prove the possession of intellectual property rights to the data as well as supply proofs of abuse of the multimedia data.
The invention of the procedure for marking binary data, particularly but not limited to image data or audio signals, comprises the generation of a discrete position sequence for the integration of information units in the data set to be marked which is dependent on features specific to the data to be marked as well as on a key, and the subsequent reading or writing of the integrated (resp. to be integrated) information units in the predetermined position sequence in the data set.
The invention also comprises the embedding of additional secret, hidden data in a robust fashion in multimedia data, i.e. particularly digital images. The same procedure can also be used for marking audio signals which are structured in time and not in pixel values.
The invention, apart from the use in color, greyscale and bilevel still images, also comprises the use with digital video data, i.e. image sequences. The embedding of the additional information does not lead to a visible degradation of the image quality. The embedded information can be reconstructed given the knowledge of a possibly secret key.
The first step in the procedure described is the generation of a pseudo-random position sequence which is used to determine the positions in which a code or more generally an information unit is embedded. Characteristic information, which may for instance be extracted from the image data itself are used in combination with a secret key as a seed value for the position generation. In a second step the information unit is either read or written in the positions determined by the positions sequence. There exist different methods for reading and writing of the information units depending on the type of image data representation.
Generally speaking there are three distinguishable marking techniques which depend on the data sets to be marked.
A. Frequency Based Marking for Color and Greyscale Images
This technique is based on the idea that typical digital images of humans, buildings, natural scenery etc. may be considered as non-stationary statistical processes which are highly redundant and tolerant to disturbances. Embedding of the binary coded information occurs in the frequency domain of the image. The subsequent description assumes a representation of the image in the spatial domain into which every image can be converted. The image is partitioned into blocks of pixels. These blocks are transformed from the spatial domain into the frequency domain by means of a transformation function. Arbitrary functions may be selected as the transformation function. One of the preferred functions is the so-called xe2x80x9cdiscrete cosine transformxe2x80x9d (DCT). Further transformations are similarly suited such as e.g. the wavelet transformation, the Fourier transformation, the Hadamard-Walsh transformations or the Z transformation. Especially when using the wavelet transformation, larger block sizes are useful. Subsequently the blocks of frequency components (i.e. the relevant parts of the blocks) are quantized. For the quantization step, quantization matrices similar to those used in the JPEG compression standard are preferred. With regard to the aforementioned compression standard, details may be found in the publication by Wallace xe2x80x9cThe JPEG still picture compression standardxe2x80x9d, Communications of the ACM, Vol. 34, No. 4, April 1991, PP 30-40. Using the position sequences found in the first step of the procedure of the invention the blocks as well as the exact positions within the selected blocks are determined in which the information will be embedded. The embedding of a single bit (xe2x80x9c1xe2x80x9d or xe2x80x9c0xe2x80x9d) in a block is accomplished by establishing patterns of relations (i.e. size relations) between particular elements, the so-called frequency coefficients of the block, with a moderate variance threshold. Particularly suitable for the purpose of embedding information are medium frequencies since higher frequency components may easily be removed without visibly degrading image quality (e.g. using a lossy compression algorithm). Modifications in the low frequency components would lead to the generation of visible artifacts, degrading overall image quality. Generally speaking, however, all frequencies are usable. In order to make the information units used for embedding information as inaccessible to illicit access or data processing steps which format the data set or suitably reorder the data set as possible, the robustness of the embedded information against unauthorized access can be optimized by tuning two parameters. On one hand this is the so-called Distance D between selected quantized frequency components with a larger distance yielding a better robustness, albeit coupled with a slightly higher visibility of the modification. The second parameter is the so-called Quantization factor Q which is used for quantization of the selected values for embedding the information code. A larger quantization factor results in a smaller modification of the image data, however it also results in a lower robustness against lossy compression algorithms such as the one employed by the JPEG standard. In order to further increase the robustness against attacks, a repeated (redundant) marking process may be performed. In this case the same information or information units are embedded more than once in the original data. If all blocks of a data set have been selected and marked, a kind of xe2x80x9cholographicxe2x80x9d marking is achieved. In partitioning the data set into blocks, the size of the blocks may be kept variable. In an extreme case the entire data set consists of a single large block; in the case of the frequency based procedure this requires transformation of the entire block or data set. The block size may also be as low as one pixel per block with reasonable blocks sizes starting at 2xc3x972 pixels. The different blocks in a data set may also have different sizes. After integration of the information units the image may be requantized and retransformed into a representation in the spatial domain. The procedure allows a number of variations such as e.g. the arbitrary selection of the transformation, the forming of blocks, the selection of the frequency components, the selection and assignment of encodings to the relation patterns as well as the distribution of the relation patterns and the associated coding over different blocks.
B. Relation-based Marking of Bilevel Images
The value of each single pixel in a bilevel image corresponds to either a xe2x80x9c1xe2x80x9d or xe2x80x9c0xe2x80x9d value. This leaves no room for the insertion of noise or disturbances which might be used for inserting additional information. In order to embed information which is binary encoded, suitable areas in the image must be found which will not deteriorate quality of the original image significantly. These image areas are different for each individual image or at least different for certain types of images. The proposed procedure for bilevel images is based on the relation between xe2x80x9c0xe2x80x9d and xe2x80x9c1xe2x80x9d bits in a selected block. Let R(b) be the rate of black pixels, for instance the xe2x80x9c1xe2x80x9d bits, in a selected block b of the image:
R(b)=Ns/N
Ns means the number of black pixels in the block b and N refers to the block size, i.e. the total number of pixels in the block b. The embedding of a bit in a block b occurs following the given procedure:
A xe2x80x9c1xe2x80x9d bit is embedded in a block b if R(b) is within a given range (T1,min; T1,max). A xe2x80x9c0xe2x80x9d bit is embedded if R(b) is in a different given area (T2,min; T2,max). Both determined areas are between 0% and 100%. For embedding a bit, the respective observed block, if required, is modified by changing xe2x80x9c1xe2x80x9d-bits to xe2x80x9c0xe2x80x9d-bits or vice versa sufficiently often for R(b) to fall into the corresponding range. If too many modifications would be necessary, the block is declared invalid and modified in such a way that R(b) falls into an invalid range outside both the ranges of xe2x80x9c0xe2x80x9d and xe2x80x9c1xe2x80x9d. Furthermore, a buffer is introduced between the given ranges and the invalid range which increases the degree of robustness against the use of image manipulation techniques on the marked image. The buffer therefore describes the number of bits in a block which may be changed by image manipulation techniques without damaging the embedded bits. For example, a buffer of 5% means that changing less than 4 bits within an 8xc3x978 bit block does not damage the embedded code. A reasonable choice of the areas (T1,min; T1,max) and (T2,min; T2,max) as well as the buffer (e.g. T1=(55,60), T2=(40,45) and a buffer of 5 for an 8xc3x978 bit block) allows a reasonable balance between robustness against image manipulation techniques on one hand and the visibility of the embedded information on the other hand. The algorithm used for adding the information units in the positions predetermined by the position sequence of the data sets is to some degree dependent on the distribution of the xe2x80x9c1xe2x80x9d and xe2x80x9c0xe2x80x9d bits. For example, in the case of xe2x80x9cditheredxe2x80x9d images, the modifications are evenly spread across the entire block. The bit which has the most neighbors of the same value is changed. In the case of bilevel images (black/white) with sharp contrasts the modifications are performed at the edges between black and white areas. The bit with the most neighbors of the opposite value is changed. In both cases the bits of the neighboring blocks are drawn into consideration. As described above a criterion for the selection of suitable blocks is introduced which is characterized as a threshold T. If the changes to the selected coefficients of a block are  less than T then the block is valid, otherwise it is invalid. In the following section several methods are proposed which allow the decision whether or not the block under consideration is valid or not during the extraction process:
A statement regarding the validity of each block for the reconstruction of the marking is stored as the second part of the key. A sequence xe2x80x9c110111 . . . xe2x80x9d informs that the first, second, fourth, fifth and sixth block is valid whereas the third block is invalid. The sequence of blocks isxe2x80x94as beforexe2x80x94determined via the key (i.e. the first part of the key) and the characteristic properties of the image.
A second method defines a buffer between the valid and invalid blocks. If a modification of the coefficients for the integration of the marking is larger than the threshold T but smaller than the sum of threshold and buffer, then the buffer is modified in such a way that the modification is larger than the sum of the threshold and the buffer. This technique is applied in both aforementioned cases.
If the threshold T is set to zero then no modifications of the original data for the embedding of the marking are allowed. In this case a natural embedding process is used, i.e. only such blocks which do not require a change in the relationships between the frequency coefficients are used for marking. The information which blocks/positions are used for marking are embedded as the second part of the key.
C. Procedures for Marking of Image Sequences
The aforementioned procedures for marking of images mainly refer to marking techniques for still images; however, they may also be applied to image sequences such as videos. In the case of image sequences, additional modes of attack against markings are possible. For example, markings of single frames may be removed by deleting the frame from the sequence. Motion estimating and motion compensating compression techniques such as those applied in the MPEG standard may also lead to the removal of markings. To compensate for this, the third technique for the marking of image sequences embeds the markings repeatedly in the still frames of certain sequences of the entire image sequence with the robustness against known attacks increasing by scaling the length of the information sequence to be embedded. In one extreme case the information is embedded into each frame of the entire video.