1. Field of the Invention
This invention relates to a method and apparatus for inputting and encoding a moving image and to an apparatus for decoding the encoded moving image. This invention particularly relates to a technique for encoding an image frame by first partitioning it into multiple regions and to a technique for decoding the encoded image frame.
2. Description of the Related Art
FIG. 1 is a block diagram of a first prior art showing the configuration of a moving image encoder based on ITU-T recommendation H.263, wherein numeral 1 indicates an input digital image signal (hereinafter referred to simply as an input image), numeral 101 indicates a differentiator, numeral 102 indicates a prediction signal, numeral 103 indicates a prediction error signal, numeral 104 indicates an encoder, numeral 105 indicates encoded data, numeral 106 indicates a decoder, numeral 107 indicates a decoded prediction error signal, numeral 108 indicates an adder, numeral 109 indicates a local decoded image signal, numeral 110 indicates a memory, numeral 111 indicates a prediction section, and numeral 112 indicates a motion vector.
The input image 1 to be encoded is first input to differentiator 101. Differentiator 101 takes the difference between input image 1 and prediction signal 102 for output as prediction error signal 103. Encoder 104 encodes input image 1, which is an original signal, or prediction error signal 103, and outputs encoded data 105. The encoding method in encoder 104 employs a technique in the above-mentioned recommendation where prediction error signal 103 is transformed from a space region to a frequency region using Discrete Cosine Transformation (DCT), a type of orthogonal transformation, and the obtained transformation coefficient is linearly quantized.
Encoded data 105 is branched into two directions, where one is transmitted to a receiver, or an image decoding apparatus (not shown) and the other is input to decoder 106 within the present apparatus. Decoder 106 performs an operation which is the opposite of encoder 104, and generates and outputs decoded prediction error signal 107 from encoded data 105. Adder 108 adds prediction signal 102 with decoded prediction error signal 107 and outputs the result as decoded image signal 109. Prediction section 111 performs motion-compensated prediction using input image 1 and decoded image signal 109 of the previous frame stored in memory 110, and outputs prediction signal 102 and motion vector 112. At this time, motion compensation is performed in block units of a fixed size called a macro block comprising 16xc3x9716 pixels. As an optional function for a block within a region having large movements, motion-compensated prediction can be performed with the macro block partitioned into four sub-block units of 8xc3x978 pixels. The obtained motion vector 112 is transmitted toward the image decoding apparatus, and prediction signal 102 is sent to differentiator 102 and adder 108. According to this apparatus, the amount of data of the moving image can be compressed while maintaining image quality through the use of motion-compensated prediction.
In this prior art, the shape of the encoding unit region is limited to two types. Moreover, both shapes are rectangular. Therefore, there is naturally a limit in the encoding which can be adapted to the scene structure or features of an image. For example, if it is desired to increase the amount of code only for an object having large movements, it is preferable, although difficult in this prior art, to define a region having a shape identical to that of the object.
FIG. 2 is a block diagram of an image encoding apparatus concerning a second prior art. This apparatus is based on an encoding method that was proposed in xe2x80x9cA Very Low Bit Rate Video Coder Based on Vector Quantizationxe2x80x9d by L. C. Real et al (IEEE Transactions on Image Processing, Vol. 5, No. 2, February 1996). In the same figure, numeral 113 indicates a region partitioning section, numeral 114 indicates a prediction section, numeral 115 indicates a region determination section, numeral 116 indicates encoding mode information including inter-frame encoding and intra-frame encoding information, numeral 117 indicates a motion vector, numeral 118 indicates an encoder, and numeral 119 indicates encoded data.
In this apparatus, input image 1 is first partitioned into multiple regions by region partitioning section 113. Region partitioning section 113 determines the size of regions in accordance with the motion-compensated prediction error. Region partitioning section 113 performs judgment using a threshold with regard to dispersion of the inter-frame signal and assigns small blocks to regions having large movement and large blocks to regions, such as backgrounds, having small movement from among ten types of block sizes of 4xc3x974, 4xc3x978, 8xc3x974, 8xc3x978, 8xc3x9716, 16xc3x978, 16xc3x9716, 16xc3x9732, 32xc3x9716, and 32xc3x9732 prepared in advance. In concrete terms, a dispersion value is calculated by region determination section 115 for the prediction error signal obtained by prediction section 114, and based on it the block size is determined. Attribute information 116, such as region shape information and encoding mode information, as well as motion vector 117 are determined at this time, and the prediction error signal or the original signal is encoded by encoder 118 in accordance with the encoding mode information to yield encoded data 119. Subsequent processes arc the same as those of the first prior art.
This prior art is richer in processing flexibility than the first prior art from the viewpoint of preparing multiple sized blocks. However, this apparatus also limits each region to a rectangular shape. Therefore, even with rectangular shapes in ten sizes, there is room for improvement in adaptability with respect to arbitrarily shaped image regions.
The present invention takes into consideration these problems with the object of providing a moving image encoding technique for performing more flexible processing according to the conditions of the image to be processed. The object of this invention, in more concrete terms, is to provide a moving image encoding technique using region partitioning techniques that can accurately handle various image structures. Another object of this invention is to provide a partitioning criterion based on various points of view when partitioning regions for encoding. Still another object of this invention is to provide a technique for correctly decoding the encoded data of regions that have been partitioned into various shapes.
The moving image encoding method of this invention includes two steps. A first step partitions an input image into multiple regions based on a predetermined partitioning judgment criterion. Until this point, the encoding process is the same as the general conventional region-based encoding. However, in a second step, this invention integrates each of partitioned multiple regions with adjacent regions based on a predetermined integration judgment criterion. Thereafter, in a third step, the image signal is encoded for each of the regions remaining after integration. According to this method, the integration process allows regions to take on various shapes. Thus, a region having a shape closely matching the structure of an image or outline of an object can be generated.
The moving image encoding apparatus of this invention includes a region partitioning section and an encoder. The region partitioning section includes a partitioning processing section for partitioning the input image into multiple regions based on a predetermined partitioning judgment criterion, and a integration processing section for integrating each of multiple regions partitioned by the partitioning processing section with adjacent regions based on a predetermined integration judgment criterion. The encoder encodes the image signal for each of the regions remaining after integration by the integration processing section. According to this apparatus, a comparatively high image quality can be achieved at comparatively high data compression ratios while flexibly supporting the structures of images.
The above-mentioned integration processing section performs preliminary encoding and decoding of images for each region, and may examine the amount of code and the encoding distortion. In such a case, the encoding distortion can be minimized under the constraint of a predetermined amount of code.
The above-mentioned partitioning processing section includes a class identifying section for classifying the importance of regions into classes, and may judge whether or not to partition each region based on an activity to be described later and the class. If the class identifying section references feature parameters in images, the recognition of objects becomes possible thus facilitating more accurate region partitioning.
On the other hand, the moving image decoding apparatus of this invention inputs and decodes the encoded data of the image that was encoded after being partitioned into multiple regions. This apparatus includes a region shape restoring section and an image data decoder. The region shape restoring section restores, based on region shape information included in the encoded data, the shape of each region that was partitioned during encoding. The image data decoder, after specifying the sequence in which regions were encoded based on the shapes of the restored regions, decodes the image for each region from the encoded data. According to this apparatus, accurate decoding is achieved even if regions having various shapes are generated in the encoding stage.