The present invention relates to an electronic watermark unit for embedding watermark information in contents such as audio, music, motion pictures, still images, and the like which are digitized as data, and an electronic watermark detection unit for detecting watermark information from contents having embedded watermark information.
Digital watermarking is a technique as follows. Information such as identification information concerning a copyright holder or user of a content, right information of a copyright holder, conditions for use of a content, secret information required when using a content, copy control information, and/or the like (which are called watermark information) is embedded in contents such as audio, music, motion pictures, still images, and the like which are digitized as data, such that the information cannot be easily discovered. By detecting watermark information later from contents if necessary, use of contents is limited or copyright protection is achieved including copy control. Thus, secondary use is promoted by this technique.
[Requirements for Digital Watermarking]
For the purpose of preventing illegal use, the digital watermarking technique must provide a characteristic (robustness) that watermark information is not eliminated or altered by various operations or intentional attacks that are generally supposed to be made on the digital copyrighted material. For example, still images and motion pictures are often subjected to irreversible compression called JPEG. (Joint Photographic Coding Experts Group) coding and MPEG (Moving Picture Experts Group) coding, respectively. Therefore, it is an important requirement for the digital watermarking technique to have robustness with respect to irreversible compression of these kinds.
[Classification of Digital Watermarks]
Conventional digital watermarking methods for images can be coarsely classified into a pixel area utilization type and a frequency domain utilization type. In a digital watermarking method of the pixel area utilization type, watermark information is directly embedded by changing pixel values. Meanwhile, in a digital watermarking method of the frequency domain utilization type, a pixel area is shifted to a frequency domain by orthogonal conversion, and is then embedded in the frequency domain. Thereafter, the frequency domain is shifted to the pixel area by reverse orthogonal conversion. The watermark information is embedded as a wave.
[Frequency Domain Utilization Type Digital Watermarking Method]
A digital watermarking method of the frequency domain utilization type is described in, for example, reference [1] Cox, I. J., Kilian, J., Leighton, T. and Shamoon, T., “Secure Spread Spectrum Watermarking for Multimedia”, NEC Research Institute, Technical Report 95-10, 1995 (hereinafter referred to as Cox et al. method). In this method, a frequency component as a target to be embedded is set within a range from a low frequency to a middle frequency, in which influences from irreversible compression are small. In this manner, the robustness is realized with respect to irreversible compression.
[Digital Watermarking Based on Spread Spectrum]
There is a method of improving the robustness with respect to irreversible compression by adopting the concept of spread spectrum. The spread spectrum means a communication method of transferring information widely spread within a band sufficiently large than a band necessary for a signal to be transmitted by communication (Reference [2]: Yamauchi Yukimichi, “Spread Spectrum Communication”, Tokyo-Denki-Daigaku Shuppankyoku, 1994). This method is excellent in tolerance to noise on transfer paths. The concept of spread spectrum is adopted to the digital watermarking technique by considering an original content as a carrier wave, watermark information as a desirable wave, and influence from irreversible compression as a interference wave (noise). Spread within a frequency domain (see the reference [1]) has been proposed as a digital watermarking method based on spread spectrum.
[Spread in Frequency Domain (Perturbation Method)]
In the method according to the reference [1] described above, watermark information is embedded by performing orthogonal conversion on pixel values, and watermark information is spread and embedded in a frequency domain. Spreading is carried out by changing a plurality of frequency components in the frequency domain, in accordance with a random number sequence. After the spreading, reverse orthogonal conversion is carried out. To detect watermark information, orthogonal conversion is performed on pixel values, and a determination is made on correlative values between values of the frequency components where watermark information is embedded and the random number sequence used for embedding the watermark information. Embedded watermark information is spread throughout the entire image (block) within the pixel area, so the watermark information is robust with respect to various operations. If the frequency components where watermark information is embedded is within a low/middle frequency domain, the watermark information is difficult to eliminate by a low-frequency pass filter.
[Fingerprinting]
In the digital watermarking method, consideration has been taken into an application that a user ID (user identification number) is embedded as watermark information in contents so that information specifying a user of the content is embedded. This application is called “fingerprinting” and is expected to restrict redistribution of illegal copies, e.g., pirated editions.
[Problem of Collusion Attack]
However, when there are a plurality of equal contents which respectively have different embedded watermark information items, there can be an act of using the plurality of contents to alter or eliminate the watermark information items. This act is called “collusion attack”. For example, in the collusion attack, the pixel values are averaged between the plurality of contents to forge a new content, or a part where pixel values or frequency component values differ between the plurality of contents is altered by changing the values at random or in accordance with majority/minority rule. [Conventional Countermeasures Against Collusion Attack]
As conventional methods for dealing with a collusion attack, there have been proposal for methods based on spread spectrum (the above-mentioned reference [1] and a reference [3]: Tetsuya Yamamoto, Sou Watanabe, and Tadao Kasa, “Digital Watermarking Method Capable of Determining All Collusion Attack Users-”, SCIS'98, 10.2.B, 1998), and code-logical methods (a reference [4]: Boneh, Dan and Shaw, James, “Collusion-Secure Fingerprinting for Digital Data”, CRYPTO'95, 452–465, 1955., a reference [5]: Masahiro Suzuoki, Sou-Watanabe, Tadao Kasa, “Digital Watermarking Method Robust to Collusion Attack”, SCIS'97, 31B, 1997, and a reference [6]: Jun Yosida, Keiichi Iwamura, and Hideki Imai, “Digital Watermarking Method with Less Image Quality Deterioration and Robust to Collusion Attack”, SCIS'98, 10.2.A, 1998).
According to the reference [1], a different actual random number sequence according to N(0, 1) is given for every user. It is supposed that no correlation exists between two different actual random number sequences. A collusion attack is taken as an operation which averages pixel values. The correlation value at the time of detection attenuates due to collusion attack.
In the reference [1], a colluder is detected by a defined similarity in place of the correlation value. The similarity is defined as a result of dividing a correlation value by the norm of detected watermark information. Since the norm of watermark information attenuates due to the collusion attack, the similarity does not much attenuates even if the correlation value attenuates. In this manner, all the colluders can be determined. However, this method has a difficulty that it requires an original image as an embedding target and takes a long time to determine colluders.
The reference [3] proposes a method of determining a colluder, which contrastingly uses the characteristic of attenuation of the correlation value due to averaging when a collusion attack occurs. Since a watermark which is common to colluders does not attenuate while other watermarks attenuate, a set of colluders is determined from a set of watermarks which maintains the level at the time of embedding through the collusion attack. Where n is the number of all users and c is a supposed maximum number of colluders, colluders can be determined by an embedded code having a length of (c+1)(c−1)logc+1n order. However, this method utilizes a characteristic particular to the spread spectrum method, and therefore cannot be applied to all digital watermarking methods.
The above-mentioned reference [4] proposes a method which utilizes a characteristic that a bit having a value common to all colluders cannot be observed in codes expressing watermark information by colluders. If such an unobservable bit remains unchanged, a code (called a c-frameproof code), which cannot generate a codeword of other users than colluders no matter how the other bits may be changed, is generated as an embedded codeword and is embedded as watermark information in a content.
In this method, there is a possibility that a codeword which does not belong to any person is generated as an embedded codeword. However, if a user redistributes a user's own content (native redistribution), he cannot make a denial saying that the redistribution is based on a collusion attack from another person.
The n-frameproof code which receives no limitation to the total number of colluders has a code length of n. The c-frameproof code in which the total number of colluders is c at most has a code length of 16c2log n (where c is the number of colluders and n is the number of all users).
Further, the reference [4] shows the following. That is, there exist no code (totally c-secure code) in which, if there are two groups of colluders and a common part of them is a null set, a common part common to sets of codewords (feasible sets) which can be generated by a collusion attack in each group is also a null set. That is, the reference shows that there strictly is no code that strictly cannot generate a codeword which does not belong any colluder by a collusion attack.
Hence, in the reference [4], a code (c-secure code with ε-error) having a probability of ε or less at which an incorrect colluder (an innocent user) is erroneously pointed out in case where the number of colluders is c or less is constructed as an embedded code. At first, a n-secure code Γ(n, 2n2 log(2n/ε)) with ε-error is constructed. The code length thereof is 2n2(n−1)log(2n/ε).
Further, by combining the above code with an idea used in the Traitor Tracing scheme (reference [7]: Chor, B., Fiat, A. and Naor, M., “Tracing traitors”, Proceedings of CRYPTO'94, 257–270, 1994), a c-secure code with ε-error is constructed. This code has a code length of O(c4 log(n/ε)log(1/ε)).
[Chernoff Bound]
In the reference [7], the number of keys particular to users, which is necessary to determine a colluder by using a formula of the Chernoff bound, is determined by the Traitor Tracing scheme. In the reference [4] described previously, this method is appropriately used to construct an n-secure code and a c-secure code with ε-error. When there are independent n probability variables Xiε{0,1} whose average value is p, the Chernoff bound gives bounds on the probability at which the sum of the variables is offset from the average value. The upper and lower bounds are obtained by the following formulas, respectively.
            Pr      ⁡              [                                                            ∑                                  i                  =                  1                                n                            ⁢                              X                i                                      -                          n              ⁢                                                          ⁢              p                                >                      n            ⁢                                                  ⁢            δ                          ]              <                  {                              exp            ⁡                          (                              δ                /                p                            )                                /                                    (                              1                +                                  δ                  /                  p                                            )                                      1              +                              δ                /                p                                                    }            np                  Pr      ⁡              [                                                            ∑                                  i                  =                  1                                n                            ⁢                              X                i                                      -                          n              ⁢                                                          ⁢              p                                <                                    -              n                        ⁢                                                  ⁢            δ                          ]              <                  {                              exp            ⁡                          (                                                -                  δ                                /                p                            )                                /                                    (                              1                -                                  δ                  /                  p                                            )                                      1              -                              δ                /                p                                                    }            np      
Further, the next formula exists as a loose bound.
      Pr    ⁡          [                                                                            ∑                                  i                  =                  1                                n                            ⁢                              X                i                                      -                          n              ⁢                                                          ⁢              p                                                >                  n          ⁢                                          ⁢          δ                    ]        <            2      ·      exp        ⁢          {                        -                      δ            2                          ⁢                  n          /                      (                          2              ⁢                              p                ⁡                                  (                                      1                    -                    p                                    )                                                      )                              }      
Where 0≦δ<p(1−p) exists, the next formula is given then.
      Pr    ⁡          [                                                  ∑                              i                =                1                            n                        ⁢                          X              i                                -                      n            ⁢                                                  ⁢            p                          <                              -            n                    ⁢                                          ⁢          δ                    ]        <      exp    ⁢          {                        -                      δ            2                          ⁢                  n          /                      (                          2              ⁢                              p                2                                      )                              }      [Method for Pointing Out only Two of Colluders]
The n-secure code and c-secure code with ε-error proposed in the reference [4] are designed so as to point out as many colluders as possible. Considering the users as an ordered set, the Γ0(n,d) code can be used as a code which specifies two of the uppermost and lowermost in a set of colluders. In this case, the Γ0(n,d) code can be constructed with a much smaller code length.
The Γ0(n,d) code is a code which is constructed by continuous sequences of 1 and 0, taking d bits as a unit. The d-bit sequences of 1 and 0 are disposed such that the number of units is equal to the number of codewords n minus 1. Accordingly, in this code, 1 and 0 are disposed continuously, taking d bits as a unit for each. A sequence of 1 or 0 smaller than d-bits does not exist isolatedly.
For example, let d=3 and n=5, Γ0(5,3) code becomes as follows.
111111111111000111111111000000111111000000000111000000000000
The reference [5] proposes an n-secure code in which two codes layered in the ascending order and descending order are used to specify two in a set of colluders. The code length of this code is 2n log4(2/ε)=n log2(2/ε). The reference [6] shows a similar method as follows. That is, the minimum S(Smin) which satisfies 0<weight(x|Bs) and the maximum S(Smax) which satisfies weight(x|Bs)<d are obtained by the Γ0(n,d) code. Two of colluders are specified by an algorithm which points out that Smin and Smax+1 are colluders. In case of this code, the n-secure code with ε-error has a code length of (n−1)log2(2/ε).
[2-secure Code with ε-Error]
If the total number of colluders is small, the code length of the embedded code can be small. The reference [6] described above shows a code which points out both of colluders where the total colluders is two and which has a code length of (3n1/2−1) log2(6/ε).
[Limit of Collusion-Resistance]
A reference [9] (Ergun, Funde, Joe Kilian and Ravi Kumar, “A Note on the Limits of Collusion Resistant Watermarking”, EUROCRYPT'99, 140–149, 1999) theoretically shows that a limit of resistance to collusion attacks exists without depending on details of digital watermarking methods. Their conclusion says that a try to raise the probability of pointing out correct colluders leads to an increase of the probability (false positive rate) of incorrectly pointing out innocent users mistaken as colluders.
A collusion attack supposed in the reference [9] is that a plurality of contents (e.g., contents 1, 2, and 3) in which different watermark information items are respectively embedded are averaged as shown in FIG. 1 and a random disturbance is added thereafter. From the viewpoint of Ergun et al., the discussion previously made in the reference [4] will now be reconsidered. In the discussion in the reference [4], the Γ0(n,d) code is used as an element of the stochastic c-secure code. This Γ0(n,d) code is obtained by directly multiplying coding, which takes codewords at (1,1,1) and (−1,−1,−1) as shown in FIG. 2 (d=3), by n times.
If the collusion attack supposed in the reference [9] is applied to this Γ0(n,d) code, a barycenter of the averaged content obtained resides on a line connecting (1,1,1) with (−1,−1,−1). In this case, if the content after the collusion attack is near (1,1,1) or (−1,−1,−1), it is determined that the content has not been changed by a collusion attack. Otherwise, if the content after the collusion attack is near the origin, it is determined that the content has been changed by a collusion attack.
At this Γ0(n,d) code, if there is a large bias between the number of persons having the code of (1,1,1) among colluders and the number of persons having the code of (−1,−1,−1), the barycenter is positioned to be very close to (1,1,1) or (−1,−1,−1) as a result of averaging. Since a random disturbance is given thereafter, the algorithm which specifies colluders may determine erroneously whether the content has been shifted from (1,1,1) or (−1,−1,−1) by the disturbance or from the barycenter by a disturbance with in negligible probability. That is, as Ergun et al. say that their conclusion is applicable to almost all digital watermark algorithms, the coding of Boneh et al. cannot avoid the limit of Ergun et al.
Meanwhile, as for the Γ0(n,d) code, the maximum distance between two codewords is nd, and the minimum distance therebetween is as wide as d (see FIG. 3). This is because codewords are provided sparsely in a space of receiving signals since the Γ0(n,d) code is designed with importance on the resistance to collusion attacks.
A digital watermarking algorithm must embed a codeword having a maximum distance nd between codewords so that the quality of contents might not be influenced. If a digital watermarking algorithm embeds a codeword of the Γ0(n,d) code in a content space and this embedding has a characteristic that the distance between codewords is approximately proportional to the distance between contents which the codewords are embedded in, the maximum distance between an original content and a content after embedding watermark information is nd/2 or more. Therefore, if nd is large, influences to the quality of contents are large (the embedding 1 in FIG. 4).
If the digital watermarking algorithm sets all codewords situated at a substantially equal distance from the original content by such embedding that does not maintain a relationship between embedded content and the original content in the content space, the grounds for the resistance to collusion attacks, which the Γ0(n,d) code originally has, are lost (embedding 2 in FIG. 4).
That is, with the limit of Ergun et al. in mind, it is desired to realize a digital watermarking method by coding which properly realizes both of high resistance to collusion attacks and low influence on the quality of contents.
(Watermark Resistance to Collusion Attacks by Spread Spectrum)
Meanwhile, in the digital watermarking method based on spread spectrum, the embedding strength is set so that embedding might not make large influences on the quality of contents. Based thereon, pseudo-random number sequences used for embedding correspond to codewords.
Orthogonal tranformation between the sample value space and the frequency space is linear mapping. Therefore, the collusion attack of Ergun et al. operates to average the pseudo random number sequences and further to give a disturbance, regardless of whether the digital watermarking method to be attacked is based on a space domain or a frequency domain.
In the digital watermarking method based on spread spectrum, the pseudo random number sequences as codewords are normally selected such that the cross-correlations are substantially zero. Therefore, with respect to a content obtained by averaging k contents, it is considered that the correlation corresponding to a colluder attenuates to 1/k. If the cross-correlation between pseudo random number sequences is sufficiently small and if k is not much large, the correlation value can exceed a predetermined threshold value in detection of a digital watermark, so colluders can be detected.
The method according to the reference [1] is based on a prerequisite of using an original content in detection, and detection is carried out with use of an amount called a similarity, in place of a correlation value. The similarity is obtained by normalizing a cross-correlation value between a difference obtained by subtracting an original content from a detection-target content and a pseudo random number sequence used for embedding, by a square root of a self-correlation value of the difference.
In the detection using the similarity, the correlation value of a numerator attenuates to 1/k by averaging in a collusion attack, while the norm of a difference of a denominator also attenuates to 1/k. Therefore, the similarity is expected not to attenuate. However, if other noise is added than the averaging, the influence from the noise is rather enlarged by the normalization.
In a reference [10] (Kilian, Joe, F. Thomas Leighton, Lesley R. Matheson, Talal G. Shamoon, Robert E. Tarjan, and Francis Zane, “Resistance of Digital Watermarks to Collusive Attacks”, Technical Report TR-585-98, Department of Computer Science, Princeton University, 1998), a theoretical consideration is made by statistical discussions on a subject that the digital watermarking method according to the reference [1] can prove resistance to collusion attacks of how many colluders. The pseudo random number sequence is supposed to be a Gaussian noise, and it is supposed that a collusion attack is achieved by statistically estimating an original content from contents which colluders own. As a result of this, it is concluded that resistance to collusion attacks from about several to over-ten colluders can be realized by realistic parameter setting.
Depending on the types of applications of digital watermarks, there is a case that detection using an original content cannot be carried out and a detection must be made from only the content as a detection target. In this case, the detection based on the similarity cannot be made. In this case, it is considered that tolerable number of colluders is much smaller.
All the discussions in the references [1] and [10] are based on a prerequisite that the cross-correlation between pseudo random number sequences as codewords is sufficiently small. However, it is considered that, as the number of pseudo random number sequences increases, the possibility at which a pair having a large cross-correlation value accidentally appears increases in general, even if they are selected at random.
However, there remains an unsolved problem in how many pseudo random number sequences can reduce the cross-correlation value between an arbitrary pair, how pseudo random number sequences having such a characteristic can be selected, and what a digital watermarking method should be realized to achieve a method resistant to collusion attacks, by using the selected pseudo random number sequences as codewords.
This problem is considered as one of problems in how a digital watermarking method based on coding which properly realizes both of high resistance to collusion attacks and low influence on the quality of contents can be realized, after being conscious about the limits according to the reference [9] described previously in the article concerning the resistance to collusion attacks.
A method of using a M-sequence is known as a method for generating binary pseudo random bit sequences which have a small cross-correlation. The M-sequence is generated as an output of a linear feedback shift register (LFSR), if the LFSR has a tap corresponding to a coefficient of a primitive polynomial of an extension field of GF(2). If 1 and 0 in the M-sequence are respectively replaced with +1 and −1, a PN-sequence is obtained. In the M-sequence, the appearance frequency of 0 is substantially equal to that of 1 (although the number of appearances of 1 is smaller by one than that of 0 in a period), and the cross-correlation function has a value of 1 at 0 and has a value of −1/L otherwise. L=2n−1 is given where L is a period of the sequence and n is the number of stages of the register.
If sequencies obtained by cyclic-shifting a PN-sequence obtained from a M-sequence are adopted as codewords, codewords having a small cross-correlation are obtained. These codewords may be used as pseudo random number sequences when embedding a digital watermark. This random number sequences can be used for digital watermarks based on spread spectrum of both in the space domain and the frequency domain.
In the digital watermarking method based on spread spectrum of the frequency domain, a Gaussian noise according to N(0,1) is used as a codeword. To construct a plurality of codewords which have a small cross-correlation, the following method is adopted. That is, random number sequences are generated one after another, and a check that each of them has a small correlation with all random sequences that have been generated before is made. If any of them has a large cross-correlation value, the random number sequence is not adopted as a codeword.
However, in this method, it is not guaranteed that a newly generated random number sequence has small cross-correlation with a random number sequence which has been generated before. Therefore, a newly generated random number sequence may be abandoned in some cases, so the processing is wasteful. In particular, if the number of random number sequences increases to some extent, the probability increases.
As has been explained above, in conventional digital watermarking techniques, there is a risk that illegal users cannot be specified even when illegal redistribution is carried out by eliminating or forging watermark information by collusion attacks.
Also, in conventional proposals for realizing robustness to collusion attacks, it is necessary to embed watermark information by a very redundant manner. Therefore, there is a drawback that a very large total number of users or colluders cannot be assumed. Even if a large total number of users or colluders can be assumed, deterioration of the quality of contents may be induced by embedding a codeword having a large code length, as watermark information.
Further, watermark information (or an embedded codeword) must be constructed after correctly estimating a detection error when determining or specifying a colluder. In this respect, however, the conventional digital watermarking techniques do not have enough consideration on their validity. In particular, the conventional techniques do not provide sufficiently by practical countermeasures against the case where three or more colluders participate in an alteration. Also, the code length of an embedded codeword as watermark information may be unnecessarily large for the expected detection error.