1. Field of the Invention
The present invention relates to an associated information adding apparatus and method for adding information associated with image data, such as a still image or a moving image sequence, as an electronic watermark (watermark) into the image data, and an associated information detecting apparatus and method for detecting the associated information.
2. Description of the Related Art
There is a technique in which information associated with arbitrary image data (still image or moving image sequence) is added to the image data, and the associated information is detected at the time of reproduction and is used. As a typical usage example of this technique, it is possible to make mention of addition of copyright information.
In the case where any unspecified user can use specific image data, it is conceivable that a person having copyright on the image adds copyright information previously into the image data in order to assert the right. By adding the copyright information, in the case where the copyright information to instruct a user that the image data are unable to be displayed is detected in the processing procedure of a reproducing apparatus or reproducing method of the image, it becomes possible to take such a countermeasure that the image data are not displayed.
At present, the foregoing addition or detection of copyright information is used in the illegal copying preventing function of a videotape of analog recording (a video signal is recorded in a state of an analog signal) and the like. This function makes it impossible to illegally copy a videotape obtained through a method such as borrowing from a rental agent, and the right of a person having the copyright of the videotape is protected.
In the case of a videotape of analog recording, since image data are recorded in an analog manner, the picture quality is deteriorated when copying is performed. On the contrary, in an apparatus for recording and reproducing image data in a digital manner, which has recently come into wide use, the picture quality is not deteriorated by copying in principle, and even a number of repetitions of copying can be made without deterioration of the picture quality. Thus, the damage by illegal copying with an apparatus for performing the processing in a digital manner becomes more serious than the case of an analog apparatus, and prevention of illegal copying in the apparatus for performing the processing in a digital manner becomes very important.
As methods of adding information associated with image data, such as the foregoing copyright information, into the image data, there are mainly two methods.
The first method is a method of adding information into an auxiliary portion of image data. The auxiliary portion of image data indicates a portion other than the image data in an effective screen region. For example, it is a vertical blanking period of an analog video signal, a header portion or an additional data portion of a digital video signal, or the like. Actually, in an analog videotape, auxiliary associated information of image data is added to a part of a vertical blanking period.
The second method is a method of adding information into a main portion of image data, that is, the image data in an effective screen region. In this method, some specific pattern is added to all or a part of an image to such a degree that it can not be visually sensed. This is called an electronic watermark processing. As a specific example of this, there is a spectrum diffusion in which addition and detection of information are performed by employing a key pattern produced by using random numbers, M-sequence, etc.
In the first method, since information is added into an auxiliary portion of image data, that is, into a place different from the contents themselves of the image, there is a problem that if the auxiliary portion is once separated, the added information is lost.
On the contrary, the electronic watermark processing is a processing for embedding information as noise into part of image data unimportant for the perception of a human being. The information embedded in the image data by such electronic watermark processing (hereinafter, the information embedded by means of electronic watermark is referred to as a xe2x80x9cwatermarkxe2x80x9d) is added to the same frequency region and time region as the image data, so that it is hard to remove the information from the image data. On the other hand, there is a feature that even after filtering processing or data compression processing is performed for the image data, the watermark embedded in those can be detected from the image data.
A method using a watermark as a method of adding associated information into a main portion of image data, which is the second method having the foregoing features, will be described below.
First, the outline of an adding and detecting system of a watermark will be described with reference to FIGS. 1 to 4.
FIG. 1 shows an example of a watermark pattern WM. The watermark pattern WM of this example has a size of verticalxc3x97horizontal=4n pixelsxc3x974n pixels (n is a natural number), and such a pattern that either one of two symbols +1 and xe2x88x921 is taken for each pixel is designed to be used as shown in the drawing. In actual usage, it is preferable that the watermark pattern takes either one of the two symbols at random. The shape and size of a region of the watermark pattern are arbitrary.
When associated information is added to an image, a region having the same size as a region of a watermark pattern to be added is set on the image as an object to which the addition is performed. The set region and the watermark pattern WM are overlapped with each other to be checked, a value xe2x80x9caxe2x80x9d is added for a pixel corresponding to the symbol of +1 in the watermark pattern WM, and a value xe2x80x9cbxe2x80x9d is subtracted for a pixel corresponding to the symbol of xe2x88x921. Here, the value xe2x80x9caxe2x80x9d and the value xe2x80x9cbxe2x80x9d can take arbitrary values.
FIG. 2 shows an example of addition of the watermark pattern WM into an image. In this example, for simplification, it is assumed that every pixel value in a region of an image Pi as an object to which the addition is performed is 100, and the foregoing values xe2x80x9caxe2x80x9d and xe2x80x9cbxe2x80x9d are set to a=1 and b=1. In an image Po obtained as a result that an embedding operation of this watermark pattern WM has been performed, the pixel values are divided into 101 and 99 as shown in the drawing.
When associated information is detected, a region having the same size as the region of the watermark pattern WM is set on the image as the object to which detection is performed. An evaluation value as to correlation between the set image region and the watermark pattern WM is found. In this case, the sum for all pixels of the set image region is used as the evaluation value.
Specifically, when all pixels are summed to find the evaluation value, the set image region and the watermark pattern WM are overlapped with each other to be checked, addition is applied to the pixels of the symbol of +1 of the watermark pattern WM, and subtraction is applied to the pixels of the symbol of xe2x88x921. At this time, detection is made by using the same pattern as the watermark pattern used when the associated information is added.
FIG. 3 is a view for explaining calculation of the evaluation value in the case where the watermark is detected from the image Po to which the watermark pattern WM was added in the manner as shown in FIG. 2. In this example of FIG. 3, the evaluation value becomes (4n)2 (equal to the number of pixels contained in the region).
On the contrary, the image Po of FIG. 4 shows a case where the watermark pattern WM is not added, and the evaluation value of correlation to the watermark pattern with respect to this image Po becomes 0.
Actually, although there hardly occurs a case where all pixels of the image Pi as an object to which the watermark pattern is added have the same value, in the case where the region of the watermark pattern is sufficiently large and the watermark pattern is sufficiently random, from the horizontal and vertical correlation of an image, the evaluation value as to the image Po in the case where the watermark is not added becomes almost zero at all times. From this, in the case where the evaluation value exceeds a certain threshold value, it is permissible to judge that the associated information by the watermark is added to the image.
By the foregoing procedure, it becomes possible to add, as associated information, binary information (1 bit) indicating whether the watermark is added or not. In the case where it is desired to add more information (plural bits), it is possible to add 2k pieces (K bits) of information to the image by such a processing method that the whole image is divided into K pieces of regions in space, in time, or in space and time, and the above operation is performed for each region.
As the watermark pattern, for example, what is produced by using the M-sequence (longest code sequence) can be used. The M-sequence is a series composed of binary symbols, a statistical distribution of the respective symbols is constant, and code correlation is 1 at the origin and xe2x88x921/code length at other portions. Of course, the watermark pattern may be produced by a method other than using the M-sequence.
In the case where image data are recorded and reproduced in a digital manner, since the image data without any change have a very large amount of information, the image data are generally compressed. As a method of compressing the image data, a high efficiency encoding system such as JPEG (Joint Photographic Experts Group) or MPEG (Moving Picture Experts Group) is internationally standardized and has been put to practical use. Data compressed by these high efficiency encoding systems are called bit stream data or simply a bit stream. As a paired expression, an image prior to compression and an image after decoding are respectively called a baseband image or simply a baseband.
It is convenient if a watermark can be easily detected by only adding the watermark to a baseband image, even if the image is distributed as the baseband image without any compression or as the bit stream after high efficiency encoding such as JPEG or MPEG has been performed. That is, it is preferable that the watermark added before compression with the high efficiency encoding or the like can be used for detection commonly in both the baseband and the bit stream.
FIG. 5 is a conceptual view for explaining the entire flow of addition and detection of a watermark to image information in this case.
That is, in FIG. 5, baseband image information is supplied to a watermark adding apparatus 10 through an input terminal 1, and information of a unit watermark wm to be added is supplied to the watermark adding apparatus 10 through a unit watermark input terminal 2. The unit watermark wm has a size corresponding to a small region of a part of an image as described later in detail.
In the watermark adding apparatus 10 of this example, a repetitive watermark WMR in a state where the unit watermark wm supplied from the input terminal 2 is repeated vertically and horizontally on the image is added to the baseband image from the input terminal 1. The baseband image information added with the repetitive watermark WMR is outputted through an output terminal 3.
The baseband image added with the repetitive watermark WMR is supplied to an MPEG encoder 20, and is encoded in high efficiency through the MPEG system, so that it is made a bit stream and is outputted through an output terminal 4. The size of an encoded block in the MPEG encoding of this case is designed to be equal to the unit watermark wm, or is designed such that the unit watermark wm becomes equal to an integral number of encoded blocks. In other words, the size of the unit watermark wm is selected to be a size an integral number of times as large as the size of the encoded block.
The image information of the baseband image added with the watermark WMR and outputted from the output terminal 3 is supplied to a watermark baseband detecting apparatus 30 through an input terminal 5. In the watermark baseband detecting apparatus 30, on the basis of the unit watermark wm (equal to the unit watermark wm from the input terminal 2) from an input terminal 6, the watermark is detected from the baseband image information as described later, and the detection result is led out through an output terminal 7.
The image information of the bit stream added with the watermark WMR and outputted from the output terminal 4 is supplied to a watermark bit stream detecting apparatus 40 through an input terminal 8. In the watermark bit stream detecting apparatus 40, on the basis of the unit watermark wm from the input terminal 6, the watermark is detected from the bit stream image information as described later, and the detection result is led out through an output terminal 9.
The watermark adding apparatus 10 of FIG. 5 is constructed as shown in FIG. 6. This watermark adding apparatus is such an apparatus that the watermark WMR in which the unit watermark wm is repeated vertically and horizontally on the image is added to the image, and the image added with the watermark is outputted.
In this case, moving image data inputted through the image input terminal 1 can be expressed by I(x, y, t) (0xe2x89xa6x less than width (I), 0xe2x89xa6y less than height (I)), and the unit watermark wm inputted through the unit watermark input terminal 2 is expressed by W(x, y) (0xe2x89xa6x less than L, 0xe2x89xa6y less than L).
Where, x and y designate coordinates of each of pixels constituting the image in the horizontal direction and the vertical direction on the image, and t designates a time in an image unit. As shown also in FIGS. 8B and 8C, the width (I) and the height (I) designate the width and the height of the image, respectively. As shown in FIG. 8A, L (a natural number) designates the width and height of the unit watermark wm. Like this, the width and the height of the unit watermark wm were selected to be equal to each other.
The unit watermark wm inputted through the unit watermark input terminal 2 is inputted to a watermark repeating unit 103. The internal structure of this watermark repeating unit 103 is as shown in FIG. 7.
That is, as shown in FIG. 7, the watermark repeating unit 103 is provided with a memory 1031 having such a capacity that all elements of the repetitive watermark WMR corresponding to one image can be stored.
At the time of writing the unit watermark wm into the memory 1031, coordinate elements x and y of a coordinate (x, y) as to the unit watermark wm are inputted through an x-coordinate input terminal 11 and a y-coordinate input terminal 12. Then, memory addresses x and y corresponding to the input coordinate are set by address setting portions 1032 and 1033, and the unit watermark wm from the input terminal 2 is written in the memory 1031.
In this case, the address setting portion 1032 supplies, to the memory 1031, addresses to specify not only one memory position corresponding to the input coordinate (x, y) but also positions separate from the memory position in the vertical direction and the horizontal direction by a distance an integral number of times, like once, twice, three times, . . . , as large as L. Thus, as shown in FIGS. 8B and 8C, the watermark WMR in which the unit watermark wm repeats in the vertical direction and the horizontal direction of the image is written into the memory 1031.
That is, at every time when each value W(x, y) of the unit watermark is inputted, for all pairs of i and j (i and j are respectively a positive integer) satisfying
i%L=x, and j%L=y(0xe2x89xa6i less than width (I), 0xe2x89xa6j less than height (I)),
repeatW(i, j)=W(x, y) is written into the memory repeatW holding the watermark after repetition. In this specification, xe2x80x9c%xe2x80x9d is an operator for finding a surplus.
This will be described with reference to FIGS. 8A to 8C. When the unit watermark W(x, y) of FIG. 8A is inputted, the watermark repeating unit 103 repeats the unit watermark W(x, y) to form the watermark WMR of FIG. 8B which has the width(I) and the height(I).
FIG. 8B shows the case where the width(I) of the image and the height(I) of the image can be respectively divided by the width and the height L of the unit watermark wm. In the case where they can not be divided, the repetitive watermark becomes as shown in FIG. 8C.
The coordinate (x, y) corresponding to each pixel position of the image information inputted from the image input terminal 4 is sequentially supplied through the x-coordinate input terminal 11 and the y-coordinate input terminal 12 into the memory 1031, so that the watermark WMR written in the memory 1031 of the watermark repeating unit 103 in the manner as described above is read out. The read out repetitive watermark WMR is transferred to a watermark embedding unit 101.
On the other hand, the baseband image information inputted through the image input terminal 4 is sent to not only the watermark embedding unit 101 but also an embedding amount judging unit 102. The embedding amount judging unit 102 investigates the feature of the inputted image, judges, at each place of the image, the embedding amount of the watermark in which addition scarcely has an influence on picture quality, and transmits the amount to the watermark embedding unit 101.
In the watermark embedding unit 101, in accordance with the repetitive watermark WMR inputted from the watermark repeating unit 103, the watermark is embedded in the image sent from the image input terminal 4. At that time, the embedding amount is adjusted in accordance with the information of the embedding amount inputted from the embedding amount judging unit 102.
Here, the embedding amount judging unit 102 is necessary for improving the picture quality of the image added with the watermark, and is not indispensable for addition and detection of the repetitive watermark WMR. For example, even if addition with the same embedding amount is performed to all portions where addition is indicated by the watermark pattern without using the embedding amount judging unit 102, there does not occur a problem in detection of the watermark.
The image information added with the repetitive watermark WMR made by the watermark embedding unit 101 is outputted from the baseband image output terminal 3.
Here, the operation of the watermark embedding unit 101 will be digested.
When the embedding amount found by the embedding amount judging unit 102 with respect to the input image I(x, y, t) is made D(x, y, t), in the case where embedding is performed by a specified amount in the embedding unit 101, the image wmI(x, y, t) after addition of the watermark becomes
wmI(x, y, t)=I(x, y, t)+repeatW(x, y)xc3x97D(x, y, t).
When the watermark is added to all pixels by a constant embedding amount, the image wmI(x, y, t) after addition of the watermark is expressed using a constant D by
wmI(x, y, t)=I(x, y, t)+repeatW(x, y)xc3x97D.
Next, a structural example of the watermark baseband detecting apparatus 30 will be described with reference to FIG. 9.
The watermark baseband detecting apparatus 30 of FIG. 9 reads the image information of the baseband and the pattern of the unit watermark wm, and outputs the information of the watermark contained in the baseband image information. The information of the watermark is such that when the watermark exists in the image, it is the shift amount of position where the unit watermark is detected, and when the watermark does not exist in the image, it is information indicating nonexistence of the watermark. The shift amount of the position where the unit watermark is detected is equal to the shift amount between the pattern of the unit watermark wm to be compared and the unit watermark pattern in the watermark on the detected image.
As shown in FIG. 9, the image I(x, y, t) (see FIG. 10A) inputted from the baseband image input terminal 4 is inputted into a folding accumulator 301, and as shown in FIG. 10B, it is folded and accumulated with the size of the unit watermark. When an equation is used, a folding accumulation result foldI (x, y) is expressed as follows:
foldI(x, y)=xcexa3I(i%L, j%L, t)
i%L=x, j%L=y
Since L is the horizontal and vertical size of the unit watermark wm, the folding accumulation result foldI(x, y) is space information having the same size as the unit watermark wm.
FIG. 11 shows a structural example of the folding accumulator 301. This folding accumulator 301 includes a memory 3012 having a capacity for at least the unit watermark wm of Lxc3x97L elements, an accumulative adder 3011, and address setting portions 3013 and 3014 for the memory 3012.
To the accumulative adder 3011, the image information from the image input terminal 5 is inputted, and image elements of the unit watermark wm read out from the memory 3012 are supplied, so that accumulative addition is made. The output of the accumulative adder 3011 is written in the memory 3012 at an address indicated by the address setting portions 3013 and 3014, and is read out. By this, a folding accumulation image output is obtained from the memory 3012, and is led out through an output terminal 33.
In this case, coordinate elements x and y of a coordinate (x, y) of a pixel as to an image, which are inputted from the input terminals 31 and 32, are supplied to the address setting portions 3013 and 3014, respectively. In the address setting portion 3013, the operation x%L for finding a surplus is performed, and the found surplus is supplied to the memory 3012 as the address x. In the address setting portion 3014, the operation y%L for finding a surplus is performed, and the found surplus is supplied to the memory 3012 as the address y. Through this addressing, folding accumulation is executed by the accumulative adder 3011 and the memory 3012.
In the case of carrying out the folding accumulation in which folding in a unit of the size of the unit watermark and accumulation are performed like this, and in the case where, as shown in FIG. 12A, a folding region AR has no shift in regard to the watermark on the image, the watermark of the input image after folding accumulation comes to have quite the same pattern as the unit watermark wm as shown in FIG. 12B.
In the case where the folding region AR is shifted in regard to the watermark on the image as shown in FIG. 13A, the watermark of the input image after folding accumulation comes to have such a pattern that patterns obtained by dividing the unit watermark wm are gathered as shown in FIG. 13B.
However, when folding accumulation is performed as described above, naturally in the case where there is no image shift between the time of addition of the watermark and the time of detection of the watermark as shown in FIG. 12A, even in the case where there is a shift between the time of addition of the watermark and the time of detection of the watermark as shown in FIG. 13A, since the accumulated watermark components are always placed at the same position, they are emphasized, and when the number of times of accumulation becomes sufficiently large, the accumulated image components are cancelled out.
If the watermark is contained in the input image I(x, y, t), the folding accumulation result foldI(x, y) and what is obtained by shifting the unit watermark W(x, y) have correlation at all times. This correlation is investigated by using FFT (Fast Fourier Transform) as described below.
That is, the output of the folding accumulator 301 is inputted to a FFT unit 302, and after the FFT for Lxc3x97L elements is carried out, the output is sent to a convolution arithmetic unit 303. The unit watermark wm (=W(x, y)) inputted from the unit watermark input terminal 6 is also subjected to the FFT for Lxc3x97L elements by a FFT unit 304, and then, it is sent to the convolution arithmetic unit 303.
The convolution arithmetic unit 303 performs convolution of coefficients of the foregoing two FFT spaces, and sends the result to a reverse FFT unit 305. The convolution at the convolution arithmetic unit 303 is equivalent to taking correlation of combinations of all shifts between both the unit watermark wm and the image after folding in the space region.
In the reverse FFT unit 305, reverse FFT is performed to the result obtained by the convolution arithmetic unit 303 so that it is returned to the space region. The output of all coefficients obtained by the reverse FFT at the reverse FFT unit 305 is supplied to a maximum value detector 306 and a variance calculator 307.
The maximum value detector 306 searches a maximum coefficient among coefficients obtained by the reverse FFT at the reverse FFT unit 305, and outputs the maximum coefficient and its coordinate.
All coefficients from the reverse FFT unit 305 and the coordinate value of the maximum coefficient from the maximum value detector 306 are inputted to the variance calculator 307, and variance of values other than the maximum value of the coefficient is calculated.
The variance calculated by the variance calculator 307 and the maximum coefficient found by the maximum value detector 306 are inputted to a normalizing unit 308. The output of the normalizing unit 308 is a value normalized by dividing the maximum coefficient by the variance. This value is inputted to a threshold value comparator 309 and is compared with a predetermined threshold value.
In the threshold value comparator 309, when the value obtained by dividing the maximum value inputted to this by the variance is smaller than the threshold value, it is judged that there is no watermark, and an output controller 310 is controlled so that the information indicating that there is no watermark is outputted through the watermark information output terminal 7.
In the threshold value comparator 309, if the value obtained by dividing the maximum value inputted to this by the variance is larger than the threshold value, it is judged that the watermark is contained, and the output controller 310 is controlled, so that the coordinate of the maximum coefficient detected by the maximum value detector 306, that is, the shift amount of the position where the watermark is contained is outputted through the watermark information output terminal 7.
As described above, in the watermark baseband detecting apparatus of FIG. 9, correlation as to all possible shift amounts is obtained using convolution in the FFT region, and it is judged whether the watermark is added to the image on the basis of whether or not the maximum value of the correlation is larger than a predetermined standard.
In the case where the watermark is added to the image, although the shift amount is outputted as information, since it is the shift amount between the time of addition of the watermark and the time of detection of the watermark, it is impossible to make the amount have an absolute meaning. However, it is possible to make the amount have a relative meaning as described below.
Two watermarks of a certain watermark and a watermark obtained by multiplying each element of the certain watermark by xe2x88x921 (that is, inverted pattern) are added to different phases of one image. The maximum value judgement unit 306 of FIG. 9 is replaced by a maximum/minimum value judgement unit for extracting both the maximum value and the minimum value, and the maximum value or minimum value and its coordinate among coefficients from the reverse FFT unit 305 are found. After the found maximum coefficient or minimum coefficient is normalized with variance, it is judged by the threshold value comparator 309 whether the absolute value of the maximum coefficient or minimum coefficient is larger than a threshold value, and in the case where it is larger, the difference in shift amounts of the two watermarks of the watermark where the maximum coefficient was detected and the watermark where the minimum coefficient was detected is made information. Since this information is relative, it does not change even if a shift occurs in the image.
Next, a structural example of the watermark bit stream detecting apparatus 40 will be described with reference to FIG. 14.
In FIG. 14, a bit stream of image data encoded by the MPEG encoder 20 is inputted to a bit stream image input terminal 8. The watermark bit stream detecting apparatus 40 receives input of the bit stream image and the pattern of a unit watermark wm, and outputs information of the watermark added to the image. Similarly to the foregoing, the information of the watermark is a shift amount of a position where the watermark is detected if the watermark exists on the bit stream image, and is information indicating that there is no watermark if the watermark does not exist on the bit stream image.
The watermark bit stream detecting apparatus 40 of FIG. 14 has almost the same structure as the watermark baseband detecting apparatus 30 of FIG. 9 except a portion. That is, the folding accumulator 301, the FFT unit 302, the convolution arithmetic unit 303, the FFT unit 304, the reverse FFT unit 305, the maximum detector 306, the variance calculator 307, the normalizing unit 308, the threshold value comparator 309, and the output controller 310 of FIG. 9 correspond to a folding accumulator 401, an FFT unit 402, a convolution arithmetic unit 403, an FFT unit 404, a reverse FFT unit 405, a maximum detector 406, a variance calculator 407, a normalizing unit 408, a threshold value comparator 409, and an output controller 410 of FIG. 14, respectively.
The difference from the watermark baseband detecting apparatus 30 of FIG. 9 is that the input is changed from baseband image data to bit stream image data, and a DCT (Discrete Cosine Transform) coefficient extractor 411 and a reverse DCT unit 412 are newly provided before and after the folding accumulator 301. In the following, a description will be made mainly on different points from the watermark baseband detecting apparatus 30 of FIG. 9.
Bit stream image data inputted through the bit stream input terminal 8 is partially decoded by the DCT coefficient extractor 411, and a DCT coefficient of I picture is extracted.
Here, if reverse DCT is applied to the DCT coefficient to convert it into a pixel value in a space region, it is possible to subsequently make detection by the same apparatus as the watermark baseband detecting apparatus 30 of FIG. 9. However, in order to decrease a calculation amount, in the watermark bit stream detecting apparatus 40 of FIG. 14, the output of the DCT coefficient extractor 411 is supplied to the folding accumulator 401, and after folding accumulation is performed, it is supplied to the reverse DCT unit 412 and reverse DCT is performed.
In the case of an encoding system like MPEG2, DCT of two modes of a frame and a field is adaptively used. FIGS. 15A and 15B show a block in the case of DCT of the frame mode, and FIGS. 16A and 16B show a block in the case of DCT of the field mode.
That is, in the frame mode, a macro block 201 of FIG. 15A is divided into four DCT blocks 202, 203, 204 and 205 shown in FIG. 15B. In the field mode, a macro block 201 of FIG. 16A is divided into four DCT blocks 206, 207, 208 and 209 shown in FIG. 16B.
As shown in FIGS. 15A and 15B and FIGS. 16A and 16B, the ranges of pixels occupied by the block where the frame DCT is performed and the block where the field DCT is performed are different from each other.
However, it is common that a region where two DCT blocks are vertically combined is a region of horizontalxc3x97vertical =8xc3x9716 pixels. In this example, in order to make it possible to deal with any case where DCT is performed in any mode of the frame mode and the field mode, a region of 8xc3x9716 pixels in which two DCT blocks are vertically combined is treated as one unit encoded block.
When an image is divided into the encoded blocks (DCT blocks), a set of coefficients of a block at a x-th position in the horizontal direction and at a y-th position in the vertical direction, that is, a set of coefficients of two DCT blocks starting from a pixel with a coordinate of horizontal 8x and vertical 16y is divided into a portion where the frame DCT is used and a portion where the field DCT is used, and they are made FrDCT(x, y, t) and FiDCT(x, y, t), respectively. Here, counting of a number begins from zero.
The folding accumulator 401 receives these DCT coefficients from the DCT coefficient extractor 411, performs folding accumulation, and transfers them to the reverse DCT unit 412. When what was subjected to the folding accumulation are made foldFrDCT(x, y) and foldFiDCT(x, y), they are expressed by the following equations.
foldFrDCT(x, y)=xcexa3FrDCT(i, j, t)
i%(L/8)=x, j%(L/16)=y
foldFiDCT(x, y)=xcexa3FiDCT(i, j, t)
i%(L/8)=x, j%(L/16)=y
However, in order that folding accumulation can be made without any change to this DCT region, it is assumed that L/8 and L/16 can be divided and is an integer.
For example, in the case of FIG. 17A, the size of the unit watermark is L=32. In this case, in the range of the unit watermark, as shown in FIG. 17A, there are encoded blocks of 8xc3x9716 pixels, the number thereof being 4 (=L/8) in the horizontal direction and 2 (=L/16) in the vertical direction. In this case, since the size can be divided so that L/8=4 and L/16=2 are established, folding accumulation can be made without any change to the DCT region. That is, with respect to eight blocks of 8xc3x9716 pixels in the range of the region of Lxc3x97L of the unit watermark, as shown in FIGS. 17B and 17C, accumulation is performed after division is made into the frame DCT and the field DCT.
In the reverse DCT unit 412, the DCT coefficient of accumulated output of the folding accumulator 401 is subjected to reverse DCT to return it into the space region, and it is sent to the FFT unit 402. The value sent to the FFT unit 402 is such that the number of elements is Lxc3x97L, each is the sum of pixel values, and is the same as the output of the folding accumulator 301 of FIG. 9. The subsequent steps are the same as the case of the baseband.
In the adding apparatus and the detecting apparatus of the watermark as described above, by a method in which mapping is performed into the FFT space and a search in the space is performed through convolution in the FFT space, detection of a shift of the watermark over the whole image is made possible. Besides, FFT is applied after the image is subjected to folding accumulation, so that the calculation amount of the FFT is decreased.
In the format of moving pictures, there are a plurality of resolutions such as a normal resolution SD (Standard Definition) and a high resolution HD (High Definition). The resolution is further divided in the SD and HD, and resolutions as shown in the table of FIG. 18 are generally used.
Here, a letter box conversion and a pan scan set forth in the table of FIG. 18 will be described in brief.
As shown in FIGS. 19A and 19B, an aspect ratio of an image of video material and that of movie material are 4:3 and 16:9, respectively. As a method of converting a movie material into a video material and recording, the following two methods are mainly used.
The first method is such a method as to convert the movie material of a picture frame of FIG. 20A into the video material as shown in FIG. 20C, in which upper and lower portions of a video image are made surplus, and the image is displayed in a region of 16:9 of the center portion. The surplus upper and lower portions are generally filled with black. This display method is called letter box display, and its image is called a letter box image.
The second method is such a method that the movie material of the picture frame of FIG. 20A is scaled down by a factor of xc2xe time in the horizontal direction as shown at left and lower side in FIG. 20B and is displayed in the whole region (4:3 region) of the video image. The image is contracted in the horizontal direction. This display method is called squeeze display, and its image is called a squeeze image.
Conversion of the squeeze image into the letter box image is called letter box conversion. As shown in the two drawings of FIGS. 20B and 20C, the letter box conversion is such conversion that the squeeze image of FIG. 20B is scaled down by a factor of xc2xe time in the vertical direction. As described before, the squeeze image is scaled down by a factor of xc2xe time in the horizontal direction, so that an object is displayed to be thinner than the actual state. Thus, when the image is displayed for appreciation, it is necessary to perform the letter box conversion.
The letter box image has an advantage that the appreciation can be made when the movie image is displayed as it is. On the other hand, the squeeze image has an advantage that the picture quality is excellent since the resolution of the video picture frame in the vertical direction is effectively used. In the letter box image, the picture quality is inferior since there are pixels in the upper and lower portions, which are not effectively used.
The pan scan corresponds to FIG. 19A. In the pan scan, right and left image portions are cut away from the picture frame of the movie image of 16:9 of FIG. 19B and the region of 4:3 is extracted. Since the right and left image portions are cut away, the information is lost. However, all pixels of the 4:3 picture frame are effectively used, and can be watched when the image is displayed as it is. Thus, there are many opportunities where this pan scan is used.
When the foregoing resolution conversion is performed, if such a state occurs that a watermark embedded in the image can not be detected and the associated information such as copyright information can not be extracted, this is a problem. The following two methods are conceivable as methods of solving the problem.
The first solution is a method in which separate watermarks are added for the respective resolutions before and after the resolution conversion. The second solution is such a method as to enable a watermark added at one resolution to be detected at another resolution.
In the first solution, it is possible to consider a method in which the separate watermarks before and after the resolution conversion are added to separate places of the image in space or in time, and a method in which the watermarks are added to the same place of the image in space and in time. In the case where the watermarks are added to the separate places in space or in time, an amount of addition per unit time is decreased for the respective watermarks. Thus, there is a defect that detection accuracy is lowered when detection is made in the same time, while a detection time is prolonged in order to keep the same detection accuracy.
On the other hand, in the case where the watermarks are added to the same place in space and in time, signals of the two watermarks are added to the image to overlap with each other. Thus, there is a problem that the patterns of the watermarks become easy to be seen as noises added to the original image so that the picture quality is deteriorated.
In the case where the first solution is used, it becomes necessary to add separate watermarks for all resolutions regarded as having a possibility that resolution conversion is performed. Thus, there is a problem that as the resolutions to be made corresponding increase, sacrifice of detection accuracy, detection time, and picture quality becomes large.
Thus, the second solution, that is, the method in which a watermark added at one resolution can be detected at another resolution, is important.
With respect to detection in the baseband, such a method has been considered that an input image (image after resolution conversion) to a detecting device is returned to an image of the original resolution (image before the resolution conversion) through a filter in advance, and the image is inputted to a normal watermark detecting apparatus.
For example, let us consider detection of a watermark in the case where resolution conversion is made from a format of 720 horizontal pixels by 480 vertical pixels (or lines, and the same applies to the following) as the most general resolution of SD to a format of 480 horizontal pixels of ⅔ thereof by 480 vertical pixels.
The watermark is added in the format of 720 horizontal pixels by 480 vertical pixels before the resolution conversion, and in order to detect the watermark after the resolution conversion to the format of 480 horizontal pixels by 480 vertical pixels, in the baseband, an input image to a detecting apparatus is previously enlarged by using an enlarging filter in the horizontal direction by a factor of {fraction (3/2)} times, and the image is inputted to a normal watermark detecting apparatus and is detect.
However, in this method, for the purpose of detecting the watermark of the image which has been subjected to the resolution conversion, processing equivalent to returning the image to the original resolution becomes necessary, so that the structure becomes complicated. Since it is difficult to apply filtering of resolution conversion to a DCT coefficient in a DCT region, there is a problem that watermark detection in the bit stream is actually impossible in this method. Although there is a method of decoding the image of the baseband from the bit stream, since the processing amount of decoding is large in this method, it is not preferable.
In view of the problem of the first solution and the problem of the second solution, the present invention has an object to provide a method and an apparatus in which in both a baseband and a bit stream, it is possible to detect the same watermark in both cases before and after resolution conversion is performed, and it is possible to suppress a lowering of detection accuracy, an increase of detection time, a deterioration of picture quality, and an increase of detection processing amount.
The invention is an information adding apparatus and method for adding information as a watermark to an image, which generates a unit watermark having a size corresponding to a small region made of verticalxc3x97horizontal=Mxc3x97N (M and N are respectively a positive integer) pixels of a part of the image, generates a repetitive watermark in which the unit watermark pattern from a unit watermark pattern generator is repeated vertically and horizontally, checks the repetitive watermark, and adds the watermark to the image, and is characterized in that
in the unit watermark pattern generator, at all resolutions expected that resolution conversion is performed on the image, a size of the unit watermark pattern is determined such that each of a vertical size and a horizontal size of the unit watermark is integer times as larger as a size of an encoded block at image encoding.
By this, the size of the unit watermark pattern is integer times as large as the size of the encoded block at the image encoding at all resolutions expected that the resolution conversion is performed. Thus, at the time of detection of associated information, if the resolution of the image information is known, it is possible to know, at the resolution, what times as large as the encoded block the unit watermark is, and folding accumulation becomes possible in a unit of an integer number of encoded blocks, which is equal to the size of the unit watermark pattern.
Also in the bit stream, folding accumulation is performed as to the unit watermark pattern integer times as large as the encoded block, and detection of the watermark can be made.
Moreover, the invention is an information detecting apparatus and method in which a watermark pattern and an image are checked with each other for every small region made of verticalxc3x97horizontal=Mxc3x97N (M and N are respectively a positive integer) pixels of a part of the image, information is embedded as a watermark in the image on the basis of the watermark pattern, and the information is detected from an image constructed by applying resolution conversion or picture frame conversion to the image embedded with the information, and which is characterized in that the watermark pattern is converted correspondingly to the resolution conversion or picture frame conversion, checking is performed against the watermark pattern made correspondent to at least a part of an encoded block in the image subjected to the resolution conversion or picture frame conversion, an evaluation value as to the image is calculated, and the evaluation value is compared with a predetermined threshold value to judge whether the watermark is added or not.