1. Field of the Invention
The present invention relates to image-encoding methods, image-decoding methods, image-processing methods available for encoding, transmitting and accumulating images, especially regional images showing the occupancy region of a projective image of a substance, and devices thereof.
The present invention relates to a motion vector-detecting device used for image encoding and format transformation such as a frame frequency transformation, an image-encoding device for transmitting and recording images with a little encoded volume, and an image-decoding device.
The present invention relates to image-encoding methods for transmitting and accumulating images with a smaller encoded volume, and a device thereof.
2. Related Art of the Invention
Conventionally, when images are synthesized by computer graphics and the like, information relating to the opacity(transparency) of a substance referred to as xe2x80x9ca valuexe2x80x9d, other than the luminance of the substance are required.
The xcex1 value is determined for every pixel, and the xcex1 value of 1 means non-opacity, and the xcex1 value of 0 means complete opacity. Namely, when an image of a certain substance is embedded in the background, an image having the xcex1 value is necessary. Hereinafter, the images having such xcex1 values are referred to as xe2x80x9cxcex1 planexe2x80x9d. Incidentally, the xcex1 value has an intermediate value of [0, 1] in the case of substances such as clouds, frosted glass and the like, but in many substances, it tends to have two values of {0, 1}.
Encoding of the xcex1 plane may be conducted as direct enumeration of the pixel value, however, when the xcex1 plane is composed of two values of {0, 1}, binary image-encoding techniques MH, MR, MMR encoding which are the international standard by CCITT and used conventionally for facsimile and the like may be used. These are named generally as xe2x80x9crun-length codingxe2x80x9d.
In the run-length coding, pixel number of horizontally or horizontally/vertically continuous 0 and 1 is entropy-coded to perform coding efficiently.
Furthermore, taking notice of the contour of substance boundary, positional informations of each pixel constituting the contour may be coded. In the present specification, encoding of the contour of substance boundary is hereinafter referred to as contour encoding.
As typical contour encoding, there can be mentioned a chain enconding (described in H. Freeman: xe2x80x9cComputer Processing of line drawing dataxe2x80x9d, Computing Surveys, vol. 6, no. 1, pp. 57-96, (1974)).
In an image having a simple contour of the substance boundary, the value of xcex1 plane can be encoded highly efficiently by chain-coding the group of each pixel constituting the contour of the region having the xcex1 value of 1.
Considering the visual characteristics affected by the decoded result of xcex1 plane, there has been a defect in that in the above-mentioned run-length coding method and the chain coding method and the devices thereof, since encoding/decoding are carried out for every pixel, patterns of {0, 1} are coded/decoded accurately more than required from the view point of human visual characteristics, though it is not necessarily required to decode the pattern of {0,1} accurately, thereby a large coded volume becomes necessary.
Namely, concretely explained, in a general image synthesizing, a processing to mix the image with the color value of the background image referred to as xe2x80x9canti-aliasingxe2x80x9d is performed in the vicinity of boundary of the image to be synthesized. This is equal to smooth the xcex1 value in the vicinity of the substance boundary, considering the xcex1 value to be a gray scale of [0, 1] equivalently. Namely, in the image such as xcex1 plane, the space resolution is not so required. Instead, the amplitude resolution becomes necessary in the vicinity of the substance boundary.
In the conventional run-length coding and chain coding, there has been a problem in that since they are reversible coding, the space resolution is more than necessary from the view point of visual characteristics, thereby a large coded volume becomes necessary.
Furthermore, there has been conventionally proposed a method to encode dynamic images by resolving the dynamic image into layer image, as shown in FIG. 31, in order to efficiently perform opacity and recording of the dynamic image, by J. Wang and E. Adelson.
According to the literature xe2x80x9cLayered Representation for Image Sequence Codingxe2x80x9d by J. Wang and E. Adelson, Proc. IEEE Int. Conf. Acoustic Speech Signal Processing ""93, pp. V221-V224, 1993, and xe2x80x9cLayered Representation for Motion Analysisxe2x80x9d by J. Wang and E. Adelson, Proc. Computer Vision and Pattern Recognition, pp. 361-366, 1993, in which this method is disclosed, the image processings of from (1) to (3) described below are performed:
(1) A region described by the same motion parameter (in the conventional case, affine transformation parameter) is extracted from the dynamic images.
(2) A layer image is formed by superposing the same motion region. Each layer image is expressed by the opacity and luminance for every pixel showing the occupancy of the superposed region.
(3) The upper and lower relations in the eyes"" direction between layer images are examined and sequenced.
Here, the affine transformation parameter means the coefficient of a0-a5 shown in Expression 1, when the horizontal/vertical position in the image is assumed to be (x, y), and the horizontal/vertical component of the motion vector is assumed to be (u, v).
(u(x,y), xcexd(x,y))=(xcex10+xcex11x+xcex12y, xcex13+xcex14x+xcex15y) xe2x80x83xe2x80x83(1) 
It is known that the motion of the projective image of a rigid body located with a sufficient distance from a camera can be approximated by the affine transformation parameter. They utilize this to synthesize dynamic images of from several tens to several hundreds of frames, while transforming several kinds of layer images composed of one frame by the affine transformation. The informations required for transmitting and recording this dynamic image are only the image which is the base of deformation relating to each layer image (hereinafter referred to as xe2x80x9ctemplatexe2x80x9d), the affine transformation parameter, and the upper and lower relations of each layer image, therefore, recording and opacity of the dynamic image can be performed at a very high coding efficiency. In addition, the template is expressed by the opacity and the luminance for every pixel showing the occupancy of the region, for the image synthesis.
in the dynamic image expression by J. Wang and E. Adelson, the projective image deals with only the motion of a rigid body which can be described by the affine transformation. Therefore, their dynamic image expression cannot cope with the case where the motion of the projective image cannot be described by the affine transformation. For example, when a person shown in FIG. 31 conducts a motion of non-rigid body, if the camera-substance distance is small and the nonlinear item of perspective transformation cannot be ignored, it cannot be applied thereto. Moreover, their technique to determine the motion of projective image as the affine transformation parameter is composed of processings of two stages described below:
1. To determine a local motion vector at respective positions on the screen by a method based on the relational expression of space-time gradient of the luminance that the time change of the luminance can be approximated by the space luminance gradient and the inner product of the motion vector (B. Lucas and T. Kanade: xe2x80x9cAn Iterative Image Registration Technique with Anaplication to Stereo Visionxe2x80x9d, Proc. Image Understanding Workshop, pp. 121-130, April 1981).
2. To determine the affine transformation parameter by clustering the obtained motion vector.
In the above-mentioned technique, however, it cannot be applied when there is a bit motion in the dynamic image such that the relational expression of the time-space gradient of the luminance cannot be realized. Furthermore, in the two-staged method to predict the affine transformation parameter from the obtained motion vector, there is caused a large prediction error when the motion vector which is the base of the parameter prediction is wrong. The motion vector is indefinite, in the region where there is no luminance change, or in the region composed of one-directional luminance change even if there is a luminance change. In the above-mentioned two-staged prediction technique, a special processing is required for the motion vector in these uncertain regions. Collectively, the following problems 1 and 2 are not solved.
Problem 1: Efficient encoding of images (template) having luminance and opacity having irregular deformation
Problem 2: Strong prediction of the affine transformation parameter
Furthermore, in the conventional image-encoding methods and the devices thereof, for example, there is a method or a device described in CCITT Recommendation H.261. FIG. 37 is a block diagram showing the structure of the image-encoding device and the decoding device based on this H.261, wherein reference numeral 70 represents an predicted image-forming means, 71 represents a motion vector-detecting means, 72 represents a differential device, 73 represents a waveform-encoding means, 74 represents a waveform-decoding means, 75 represents an adder, 76 represents a frame delay means, 77 represents a Huffman encoder, 78 represents a Huffman decoder, 79 represents a waveform-decoding means, 80 represents an adder, 81 represents a frame delay means and 82 represents an predicted image-forming means.
The image-encoding device and image-decoding device constituted as described above will now be described. First, the motion vector-detecting means 71 detects a motion vector having a minimum sum of the differential absolute value with the decoded image of the previous frame, with respect to the block composed of 16xc3x9716 pixels (referred to as a macro block) of the input image. The predicted image-forming means 70 forms an predicted image, by inputting this motion vector and the decoded image of the previous frame. The differential device 72 outputs the differential image of the input image and the predicted image (hereinafter referred to as xe2x80x9cprediction error imagexe2x80x9d or xe2x80x9cresidual difference imagexe2x80x9d). The waveform-encoding means 73 subjects this differential image to the discrete cosine transform DCT with regard to blocks composed of 8xc3x978 pixels, to transform the image to the DCT coefficient corresponding to the frequency, and the Huffman encoder 77 subjects this to the variable-length encoding. In order to make the predicted images formed on the encoding side and on the decoding side identical, the waveform-decoding means 75 has the same structure with that of the waveform-decoding means 79 on the decoding side, to perform the inverse discrete cosine transform (IDCT) and reconstruct the prediction error image. The adder 75 adds this to the present predicted image to form the image reconstructed on the decoding side. This image is delayed by the frame delay means 76 and used for the prediction of the next frame. On the decoding side, DCT coefficient is decoded by the inverse Huffman encoder 78, thereafter, respective blocks perform the same movements as those of blocks having the same name on the encoding side, thereby the image is reconstructed.
As described above, in the encoding mode between frames of the encoding device based on H.261, when the current frame image is encoded, the predicted image of the present frame is made as a motion-compensating image from the image of the previous frame by the block correlation method (hereinafter this processing is referred to as xe2x80x9cmotion compensationxe2x80x9d), and the prediction error image of this motion compensation image and the present frame image is encoded. In this encoding device, when the motion-compensating image coincides with the previous frame without error, the volume of the information to be transmitted is only for the motion vector, thereby the image can be transmitted with a small encoded volume. Moreover, even if there is any movement in the dynamic image, when it is a simple movement or a local movement, the difference between the predicted image and the input image becomes small, thereby the dynamic image can be encoded with a smaller encoded volume compared to the case where the encoding within the frame is performed without utilizing the correlation between frames.
By the way, H.261 is a specification of the image-encoding method and device recommended for the purpose of transmitting the image having a size of length and breadth of at least 144xc3x97176 pixels or so with the encoded volume of about 64 kilobits/sec. When the image having the same size is tried to encode at an encoding speed of about 20 kilobits/sec., the DCT coefficient has to be quantized roughly. Thereby, the mosquito noise caused in the vicinity of the edge because a strong edge cannot be expressed by the DCT coefficient, and the block noise generated in the block boundary due to the difference between the average luminance levels of DCT blocks are perceived as a visual disturbance.
In H.261, the accuracy against the motion of the motion compensation is performed by the unit of one pixel. And in the recent dynamic image-encoding technique, it is performed with the motion accuracy of xc2xd pixel. When the motion of a substance takes an integer value of the pixel, the predicted image ideally coincides with the input image without error. Actually, however, it is not generally that the motion takes the integer value of the pixel, and even if the accuracy of motion is increased (for example, to xc2xd pixel accuracy of xc2xc pixel accuracy), the input pixel value is predicted by the interpolation or extrapolation of the pixel value in the vicinity thereof, thereby the prediction error in an impulse form is generated in the vicinity of the edge, even if the motion prediction is correct. This is shown in FIG. 34. Referring to FIG. 34(a), the input image moves horizontally toward the right while being deformed. Referring to FIG. 34(b), the predicted image is square, and the position of xe2x80x9cBxe2x80x9d on the left edge is wrongly predicted due to the deformation. On the contrary, the portion xe2x80x9cAxe2x80x9d on the right edge coincides roughly.
In the portion xe2x80x9cAxe2x80x9d, however, though a visually appropriate predicted image is formed by the motion compensation, there is caused a prediction error which is subjected to the residual difference encoding, which becomes the factor to make the whole encoded volume large. Here in the drawings, (g), (h) and (i) express the luminance level cutting the input image, the predicted image and the residual difference image by A-B. This problem cannot be solved even if the waveform encoding means 73 is replaced by other transformation encoding means such as a sub-band coding. Finally, selection of a portion where even if it is not a portion to be actually subjected to the residual difference encoding, it does not cause visual deterioration becomes a problem. This is not limited to H.261, but is a common problem for the methods and devices to encode the residual difference image by forming predicted image based on a certain image. In the example of FIG. 34, the portion xe2x80x9cBxe2x80x9d obviously requires the residual difference encoding, but in the portion xe2x80x9cAxe2x80x9d, the residual difference encoding is not required under a limited encoding speed.
Then considering said conventional problem of the encoded volume, the object of the present invention is to provide image-encoding methods, image-decoding methods, image-processing methods and the devices thereof which can reduce the encoded volume compared to the conventional methods and devices, while suppressing the visual deterioration by adding the visual characteristics.
That is an image encoding method of the invention comprises:
dividing an image into blocks containing a plurality of pixels;
extracting a block where pixels with different values mingle in the same block, among said divided respective blocks;
obtaining a positional information for identifying a position on said image, of said extracted block and subjecting the positional information to a contour encoding; and
subjecting a pixel pattern in the block to be subjected to said contour encoding to a waveform encoding.
Further the present invention intends to solve problems 1 and 2 and to offer devices of image encoding device, decoding device and motion vector detecting device for encoding and decoding efficiently the image of luminance opacity constituting hierarhchical images separated in a direction of the front and back relation on the axis of eyes.
That is an image encoding device of the first invention for solving the problem 1, comprises
a predicting means for predicting an image of a luminance and an opacity for an image which is a subject to be encoded, by using a correspondence between partial regions from a reference image composed of a luminance and an opacity and an inputted image series composed of a luminance and an opacity of a substance,
a prediction coding means for encoding the correspondence between the partial regions in said predicting means as a prediction code,
an error operational means which determines a difference of the luminance and the opacity between said predicted image and said image to be encoded, as the error image, and
an error coding means for encoding said error image as an error image code, and wherein
said image series are transmitted as the error image code and the prediction code with respect to said reference image.
An image decoding device of the second invention for solving the problem 1, for holding the same reference image as that of the image encoding device according to the first invention and decoding an output of said image encoding device, has;
a prediction code decoding means for decoding the correspondence between the partial regions from the prediction code,
a predicted image formation means for forming a predicted image from the reference image, by using the decoded correspondence between said partial regions,
an error image decoding means for decoding the error image from the error image code, and
an adding means for adding said predicted image and said error image to obtain the image comprising the luminance and the opacity, wherein
an image composed of the luminance and an opacity is decoded as the output of said predicted image formation means or said adding means.
An image encoding device of the third invention for solving the problem 1, comprises
a superposing means which inputs an image composed of a luminance and an opacity of a substance, classifies a region of the image into a transparent region and an opaque region, and forms a luminance image which is superposed with a luminance information and an opacity information in a manner that a luminance value of the substance is for the opaque region and a value outside the range of the luminance value is for the transparent region, wherein
the luminance image superposed with said informatious of the luminance and the opacity is encoded.
An image decoding device of the fourth invention for solving the problem 1, has
a dividing means for dividing the luminance image into the opacity image and the luminance image by making a transparent region when the luminance value is a value outside the range, and making a luminance value when it is a value inside the range, wherein
the luminance image of the luminance and the opacity is decoded.
An image encoding device of the fifth invention for solving the problem 1,
when an original image is layered by a front and back relation on an axis of eyes and an opacity of a region as well as a luminance,
comprises;
a layer image encoding means for inputting a plurality of such layer images and encoding the luminance and the opacity as a layer image code for every layer image, and
a layer image decoding means for obtaining decoded layer image from an output of said layer image encoding means,
a synthesizing means for synthesizing said decoded plural layer image by the front and back relation, the luminance and the opacity thereof, and
an error image encoding means for determining an error image between said original image and said synthesized image and encoding the error image, and
said original image are transmitted as the plurality of layer image codes and the error code between the original image and the synthesized image.
An image decoding device of the sixth invention for solving the problem 1,
when an original image is layered by a front and back relation on an axis of eyes and an opacity of a region as well as a luminance,
comprises;
a layer image encoding means for inputting a plurality of such layer images and encoding the luminance and the opacity as a layer image code for every layer image, and
a layer image decoding means for obtaining, which has a layer image decoding means for decoding the layer image comprising the luminance, the opacity, and the front and back relation on the axis of eyes by using the plurality of layer image code,
a synthesizing means for forming a synthesized image with said layer image, and
an error image decoding means for decoding the error image from the error code, and decoding the image by adding the error image to said synthesized image.
An image encoding device of the seventh invention for solving the preoblem 1 comprises;
a reference image encoding means for preliminarily recording and transmitting a plurality of reference images,
an approximating means of correspondence between images which approximates a deviation of positions where a luminance is corresponding between an input image and said plurality of reference images, that is deformation, as a polynomial function which makes a position on a screen a variable, and determines an approximation error, and
a minimum distortion reference image-selecting means which determines a reference image having small approximation error among said plurality of reference images and outputs an identifier for the selected reference image and a coefficient of the polynomial function, and wherein
a plurality of reference images are encoded by said reference image encoding means and the input image are transmitted as at least the identifier for said selected reference image and the coefficient of said polynomial function.
An image decoding device of the eighth invention for solving the problem 1, has
a reference image decoding means for reconstructing a plurality of reference images in advance,
a reference image-selecting means for selecting from said plurality of reference images a reference image corresponding to the identifier of the reference image contained in the output, and
a reference image-deforming means for determining the polynomial function which makes a position on a screen a variable on a basis of the coefficient of the polynomial function contained in the output and for deforming said selected reference image by said polynomial function, and wherein
an image is decoded by using the reference image deformed by said reference image-deforming means.
A motion vector-detecting device of the nineth invention for solving the problem 2 comprises;
a superposing means which inputs a plurality of images composed of a luminance and an opacity of a substance, subjects the opacity to the addition/multiplication of a predetermined value to transform a value range, and forms the luminance image superposed with informations of the luminance and the opacity by adding the transformed value to the luminance, and
an image analyzing means for obtaining a correspondence of the partial regions of two images by a correlation of a luminance, and wherein
the image composed of the luminance and the opacity is transformed to the image composed only of the luminance by said superposing means, and a correspondence of the partial regions is obtained using said image analyzing means between the transformed plural images.
A motion vector-detecting device of the tenth invention for solvig the problem 2, is device for expressing a motion vector at an optional position on a screen as a polynomial function which makes a position a variable, and has
an error calculating means for calculating a correspondence of the partial regions of two different images as an error, with respect to a plurality of partial regions obtained by dividing an image, and for determining a deviation between said partial regions which becomes the minimum error and the error value in a vicinity thereof,
an error function-calculating means for determining a quadratic error function which makes a deviation a variable from said deviation which becomes said minimum error and the error value in the vicinity thereof, and
an optimizing means for expressing a sum total or a partial sum of said quadratic error function with a coefficient of a polynomial function as a variable, and minimizing this sum total of the partial sum with regard to the coefficient, and wherein
the motion vector between different images are issued as the coefficient of the polynomial function.
The image-encoding device of the first invention predicts the luminance and the opacity of the image to be encoded from a reference image to form a predicted image, by matching the partial region of the image to be encoded against the reference image (that is, template) by a prediction means. The correspondence of the partial region is output as the prediction signal by a prediction-encoding means. The difference of the luminance and the opacity between the predicted image and the image to be encoded is determined by an error calculation means, and it is encoded by an error-encoding means.
The image-decoding means of the second invention holds the same reference image with that of the image-encoding device of the first invention, and decodes the correspondence between partial regions from the prediction code by a prediction encoding/decoding means and a predicted image-forming means, to form the predicted image from the reference image. On the other hand, the error image is decoded from the error image code by an error image-decoding means. And an adding means adds the predicted image and the error image to obtain the image comprising the luminance and the opacity.
In the above two inventions, on the encoding side, the difference of luminance and opacity between the predicted image and the image to be encoded is determined to be encoded. On the other hand, on the decoding side, the difference of the opacity and luminance is decoded. Thereby, layer image allowing the irregular deformation of template can be encoded.
In the image-encoding device of the third invention, making an image composed of the luminance and the opacity of a substance an input, a superposing means classifies the region into two, that is, a transparent region and an opaque region, and forms a luminance image on which information of the luminance and the opacity are superposed so that the luminance of the substance is taken in the opaque region, and a predetermined value outside the luminance value is taken in the transparent region, thereafter the luminance image is encoded.
In the image-decoding device of the fourth invention, a dividing means divides the image into an opacity image and a luminance image, such that when the luminance value of the decoded image is a predetermined value outside the value, it is a transparent region, and when the luminance value is within the value, it is the luminance value. In the above two inventions, by transforming the two informations of the luminance and the opacity constituting the template into one luminance image, the deformation of the template can be treated as a variation of this luminance image.
In the image-encoding device of the fifth invention, the original image is layered by the back and forth relation on the axis of the eyes and the opacity of the region in addition to the luminance. The image-encoding device encodes the luminance and the opacity as the layer image code by a layer image-encoding means for every layer image, making a plurality of layer images an input. On the other hand, said decoded layer image is determined from the results of the layer image-encoding means by a hierarchical image-decoding means, and synthesizes a plurality of decoded layer images from the back and forth relation, the luminance and the opacity thereof by a synthesizing means. Thereby, the synthesized result of the layer image by the decoding means is predicted. And an error image-encoding means determines the error image between the original image and the predicted synthesized image and encodes the error image.
The image-decoding device of the sixth invention decodes the layer image comprising the luminance, the opacity, and the back and forth relation on the axis of the eyes from a plurality of layer image code by a layer image-decoding means and forms a synthesized image from the layer image by a synthesizing means. And an error image-decoding means decodes the error image from the error code. Lastly, by adding the error image to be synthesized image, the image is decoded. The above two inventions makes the synthesis of the layer image as an predicted image, not as the final result, and transmits and records the difference between this predicted image and the original image, thereby can transmit and record the image without any large visual deterioration, even if the template is irregularly deformed.
In the seventh invention, the template is preliminarily transmitted and recorded by a reference image-encoding means. The correspondence between the input image and a plurality of templates is approximated as a polynomial function of image coordinates, by an approximating means of correspondence between images. A minimum distortion reference image-selecting means determines the reference image having small approximate error among this plurality of templates from said plurality of reference images irrespective of the time order, and outputs the identifier of the selected reference image and the coefficient of the polynomial function. By preparing a plurality of templates, the degree of be approximated by said polynomial function can be improved.
In the image-decoding device of the eighth invention, a plurality of templates are preliminarily constituted by a reference image-decoding means. A reference image-selecting means selects the template corresponding to the identifier of the input template, and a reference image-deforming means deforms the image based on the coefficient of the input polynomial function. Since it is assured that the deformed result of the template by said polynomial function is analogous to the input image on the encoding device side, the image can be decoded with a small encoded volume.
The motion vector-detecting device of the ninth invention, which makes a plurality of images composed of the luminance and the opacity of a substance an input, subjects the opacity to the addition/multiplication of a predetermined value, and if necessary, to the threshold processing by a superposing means to transform the range, and forms a luminance image superposed with information of the luminance and the opacity by adding the transformed value to the luminance. And an image-analyzing means obtains the correspondence of the partial region of two images by the correlation of luminance. Thereby, motion vector detection utilizing the correlation of not only the luminance but also the opacity can be performed.
In a motion vector-detecting device of the tenth invention in which a motion vector at an optional position on the screen is expressed as a polynomial function of image coordinates, an error-calculating means calculates the correspondence of partial regions of two different images as an error, with regard to a plurality of partial regions obtained by dividing the image, and determines the quadratic error function which makes the deviation a variable, from the deviation which becomes said minimum error and the error value in the vicinity thereof. And an optimizing means expresses the sum total or a partial sum of said quadratic error function using the coefficient of said polynomial function as a variable, and minimizes this sum total or the partial sum with regard to the coefficient. In the present invention, a coefficient of the polynomial function of image coordinates (affine transformation is one example thereof) is determined so that the sum total or the partial sum is minimized, from the quadratic error function which makes the deviation a variable, not from the motion vector.
Furthermore, considering said residual difference encoding problem, it is the object of the present invention to solve the problems generally caused in the predicted-image encoding which utilizes the correlation between different images and to provide an image-encoding method and a device thereof, in which the residual difference image is divided into the portion to be subjected to the residual difference encoding and the portion not to be subjected to the residual difference encoding, and even in a limited encoding speed, image encoding can be performed with a little visual disturbance.
That is an image encoding method of the invention comprises: predicting the input image from different images, expressing a region with a large prediction error as a pattern information by the threshold processing of the prediction error, subjecting said pattern information to the morphology processing in which the region is dilated after being eroded and the equivalent processing to form a mask pattern, and performing the encoding of the predicted error image based on said mask pattern.
And an image encoding device of the invention comprises:
a means for predicting an input image from different images,
a threshold processing means for expressing a region with a large prediction error as a pattern information,
a morphology means for subjecting said pattern information to an equivalent processing to a morphology processing in which a region is dilated after being eroded, thereby to form a mask pattern, and
a waveform encoding means for performing encoding for the predicted error image on the basis of said mask pattern.
First, the morphology processing comprising a processing of dilation after erosion will be described. The morphology processing is a processing conducted for a shape of binary image or a planar shape of density of a multi-value image, and this is explained in detail in Literature 1, xe2x80x9cAcademic Pressxe2x80x9d (Henk J. A. M. Heijmans: Morphological Image Operators, Academic Press, Inc. 1994) and Literature 2, xe2x80x9cIEEE Transaction on Pattern Analysis and Machine Intelligencexe2x80x9d (R. M. Harallick, S. R. Sternberg, and X. Zhuang: Image Analysis Using Mathematical Morphology, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. PAMMI-9, No. 4, pp. 532-550, July 1987). Here, the action of the present invention will be described with reference to the definition described in Literature 3, Hua-Rong JIN and Idefumi KOBATAKE: xe2x80x9cExtraction of Microcalcifications on Mammogram Using Morphological Filter with Multiple Structuring Elementsxe2x80x9d, IEICE Transaction, D2, Vol. J75-D-II, No. 7, pp. 1170-1176, 1992-7.
(1) Binary Morphology Operation
Binary image which is an image to be processed is assumed to be X, and a structuring element (a set of a two-dimensional position vector, domain) is assumed to be B. And one image constituting B is assumed to be expressed by a pixel vector b. At this time, Bxe2x80x2 (here, xe2x80x2 is used for convenience) is referred to as xe2x80x9csymmetry of Bxe2x80x9d, and the following expression is realized:
Bxe2x80x2={xe2x88x92b:bxcex5B}xe2x80x83xe2x80x83(101) 
Furthermore, Bz shows B which moves in translation by z (z is a two-dimensional vector), and means:
Bz={b+z:bxcex5B}xe2x80x83xe2x80x83(102) 
Xxe2x88x92b means X which moves in translation by xe2x88x92b. What is the base of the morphology operation is Minskwski difference and sum, which are expressed by symbols (xe2x88x92) and (+). The definition is given by the following expression:
Xxe2x8ax96B=∩bxcex5BXb xe2x80x83xe2x80x83(103) 
X⊕B=∪bxcex5BXb xe2x80x83xe2x80x83(104) 
Namely, Minkowski difference gives a domain (product set) common to the structuring elements whose all constituent elements are moved in translation by X, and on the contrary, Minkowski sum gives a union thereof. Based on these basic operation, Erosion and Dilation are expressed by the following expression:
Erosion:
Xxe2x8ax96Bxe2x80x2={Z:Bz⊂X}=bxcex5BXxe2x88x92b xe2x80x83xe2x80x83(105) 
Dilation:
X⊕Bxe2x80x2={Z:Bz∩Xxe2x89xa00}=∪bxcex5BXxe2x88x92b xe2x80x83xe2x80x83(106) 
and Opening and Closing are defined as follows:
Opening:
XB=XoB=(Xxe2x8ax96Bxe2x80x2)⊕B xe2x80x83xe2x80x83(107) 
Closing:
XB=Xxe2x97xafB=(X⊕Bxe2x80x2)xe2x8ax96B xe2x80x83xe2x80x83(108) 
Examples of Dilation processing and Erosion processing are shown in FIG. 35. The structuring elements are composed of a center pixel and four vicinity in the horizontal and vertical directions thereof.
(2) Gray-Scale Morphology Operation
When it is assumed that f(x) is luminance value, F is a defined region, g is a function of structuring elements (scalar value), and G is the defined region thereof (domain), it is defined that:
Erosion:                               Erosion:                ⁢                  
                ⁢                                            (                              f                ⊖                g                            )                        ⁢                          (              x              )                                =                                    min                                                Z                  ∈                  G                                ,                                                      X                    +                    Z                                    ∈                  F                                                      ⁢                          {                                                f                  ⁡                                      (                                          x                      +                      z                                        )                                                  -                                  g                  ⁡                                      (                    z                    )                                                              }                                                          (        109        )                                          Dilation:                ⁢                  
                ⁢                                            (                              f                ⊕                g                            )                        ⁢                          (              x              )                                =                                    max                                                Z                  ∈                  G                                ,                                                      X                    +                    Z                                    ∈                  F                                                      ⁢                          {                                                f                  ⁡                                      (                                          x                      -                      z                                        )                                                  -                                  g                  ⁡                                      (                    z                    )                                                              }                                                          (        110        )                                          Opening:                ⁢                  
                ⁢                                            (                              f                ∘                g                            )                        ⁢                          (              x              )                                =                                    (                              f                ⊖                g                            )                        ⊕            g                                              (        111        )                                          Closing:                ⁢                  
                ⁢                                            (                              f                ⁢                                  xe2x80x83                                ⁢                xe2x80xa2                ⁢                                  xe2x80x83                                ⁢                g                            )                        ⁢                          (              x              )                                =                                    (                              f                ⊕                g                            )                        ⊖            g                                              (        112        )            
If it is a pattern in which the pixel to be processed is two-valued, the dilation and erosion by the gray-scale morphology operation will have the same action with those shown in FIG. 35.
In the image-encoding method of the present invention, the input image is first predicted from different images, and subjected to the threshold processing and a region having a large residual difference is extracted as a pattern information. Thereafter, the pattern information is subjected to the dilation processing after the erosion processing of said morphology operation, that is, the opening processing, to be deformed. Thereby, in the conventional example shown in FIG. 34, as shown in (e) and (k) as the morphology operation results, the region in the form of impulse in the vicinity of the edge is eliminated. By using this as a mask pattern to encode the residual difference image, high efficient encoding can be performed, ignoring a region where the residual difference encoding is not required. Similarly in the image-encoding device of the present invention, the prediction means predicts the input image from different images, and the threshold processing means outputs the region having a large residual difference as a pattern information. Said morphology means subjects this pattern information to an equivalent processing as said opening processing by the morphology means, and outputs a mask pattern in which the region in the form of impulse is eliminated. The waveform encoding means encodes based on this mask pattern, ignoring the region where it does not cause a large visual deterioration even if the residual difference encoding is not performed.