The recent progress of high-speed digital signal processing systems with advances in LSI technology and development of video processing technique has meet an increasing interest for more effective usage of video information. In the field of telecommunications, construction and development of digital networks that is represented by ISDN (Integrated Service Digital Network) have realized video communications services such as videophones, video conferencing services, video data base services and so on. Furthermore, the spread of mobile communications networks with advances in digital information technology has increased users' need to realize mobile video communications services.
It is impractical to handle video information because the video information generally contains a very large amount of information. However, it is possible to reduce the amount of video information by reducing redundancy of the video information. High efficient video compression techniques are particularly important to the mobile communications networks that have a small amount of transmission line capacity. By this reason, international organizations ITU-T and ISO/IEC have energetically work to settle international standards on video coding methods for encoding video signal at a very low bit-rate.
A video signal contains time-sequential information such as a change of motion in picture and spatial information concerning a content of one frame (video frame or video field; both will be hereinafter called a frame), each of them has a redundancy. A motion compensative interframe prediction with orthogonal transform encoding method that has been preferably used is such that the temporal redundancy of information is first reduced through the interframe prediction with motion compensation. According to the principle of the motion compensative interframe prediction encoding method, the motion compensative interframe predicting portion prepares predicted value of an input video signal from an already encoded signal stored in a frame memory and outputs a difference between the predicted value and the input video signal as a prediction error signal. A prediction error signal encoding portion encodes the prediction error signal by orthogonal transforming method to further reduce spatial redundancy of the information. The encoded prediction error signal is locally decoded, then stored in the frame memory and is used for prediction of a proceeding frame.
In very low bit-rate video encoding, it is necessary to represent a video-signal with a very small amount of information. An amount of information allocated to the orthogonal transform encoding (i.e. prediction error encoding), therefore, is considerably limited. Accordingly, it is very important to improve the efficiency of the interframe prediction by using an interframe prediction method that can more correctly predict a change of a video signal with time.
In the latest years, many studies have been made on interframe prediction methods using affine transformation and bilinear transformation. While the motion compensative interframe prediction method represents a movement contained in a video as a translational motion by using a motion vector per unit-area, the method using the affine transformation and bilinear transformation can more accurately represent a movement in a video, i.e., with increased prediction efficiency since it can represent movement, rotation, enlargement and deformation.
An interframe predicting portion of the conventional video-coding device comprises a frame memory portion for storing already coded video signals, a motion vector detecting portion for determining a representative motion vector per unit area from a difference between an input video signal and a video signal read from the frame memory portion, a motion vector interpolating portion for determining a motion vector per pixel from the representative motion vector and a pixel value predicting portion for preparing a predicted video signal from a video signal read from the frame memory portion by using the motion vector per pixel.
The operation of the above-mentioned interframe predicting portion is as follows:
The frame memory portion stores already coded video signals as reference video frames for interframe prediction. The motion vector detecting portion receives an input video-frame signal to be encoded and reads a reference video frame stored in the frame memory. The motion vector detecting portion divides the coding video frame into unit-areas and scans the reference video frame to find an area most similar to a current coding unit-area. A displacement of a unit-area of the coding video-frame from the area found in the reference video frame is outputted as a motion vector. The motion vector is a representative motion vector representing a interframe displacement of a representative point within a unit-area (usually, a center of the unit-area). The relationship between a representative point and a vector searched unit-area are specified. In searching a similar area in a reference video frame, a sum of differential absolute values or a square-sum of differential values of each pixel in a unit-area is used as a scale of similarity. Furthermore, a center portion of the unit-area may be weighted by multiplying a differential pixel value at the center portion of the unit-area by a large coefficient and a differential pixel value at the periphery thereof by a small coefficient prior to summation so as to more accurately determine a displacement of the representative point.
The representative motion vector is transferred to the motion vector interpolating portion which in turn determines motion vectors for respective pixels using the received representative motion vectors. For affine transformation, a motion vector for each pixel existing within a triangle area surrounded by three neighboring representative points (hereinafter called transformable unit-area) is determined by solving an affine transformation expression from the representative motion vectors of the respective representative points. For bilinear transformation, a motion vector for each pixel existing within a quadrangular area surrounded by four neighboring representative points (hereinafter called transformable unit-area) is determined by solving a bilinear transformation expression from the representative motion vectors of the respective representative points. For a transformation unit-area being quadrangular or rectangular, it is equivalent that a motion vector value of each representative point is distributed in a vertical direction and a horizontal direction in proportion to an a distance between a remarked pixel and a representative point.
The pixel value predicting portion receives the (remarked) pixel motion vectors inputted pixel by pixel and, considering the motion vectors to be a displacement of the remarked pixel from its corresponding position in the reference frame, reads a pixel value at the corresponding position from the frame memory as a predicted value of the remarked pixel to compose a predicted frame. If the pixel value indicates a position in the reference frame, where no pixel exists, e.g., the pixel motion vector (displacement) is a decimal fraction, a neighboring pixel value in the reference frame is read-out from the frame memory and a predicted value of the remarked pixel is determined as an interpolating value according to a known bilinear interpolation method.
In a simplified structure of an interframe predicting portion of a conventional video-decoding device using affine and bilinear transformation method, the interframe predicting portion of the conventional decoding device comprises a frame memory portion for storing already decoded video signals, a motion-vector interpolating portion for determining a motion vector of each pixel from a representative motion vector inputted for each unit-area, a pixel value predicting portion for preparing a predicted video signal from a video signal read from a frame memory by using the pixel motion vectors.
The conventional video coding and decoding devices may show an excellent performance if an entire area surrounded by representative points can be represented by the same parameters of affine transformation and bilinear transformation. However, in the case of positions of the representative points being different from changes between video frames, e.g., positions and movement of an object, representative motion vectors of the representative points represent different movements of different frame object resulting in that a motion vector of a pixel, obtained from the representative motion vector, can not represent an adequate displacement of the pixel. Consequently, the conventional art device in this case may suffer a considerably decreased efficiency of interframe prediction, i.e., coding efficiency.
Video communication services, e.g., videophones and video conferencing have been realized over high-speed digital communication networks such as ISDN (Integrated Service Digital Network).
Recently, with the spread of radio transmission networks represented by PHS (Personal Handyphone System), progress of data modulation/demodulation techniques for PSTN (Public Switched Telephone Network) and advance of image compressing techniques, there have been increasing demands for video communication services over lower bit-rate transmission lines.
Video communication services like video telephones and a video conferencing system must transmit a very large amount of information over limited transmission lines. Accordingly, it is necessary to reduce an amount of information by compressing and to encode the video information in view of transmission speed and costs of usable transmission lines.
As well known, H.261, MPEG-1 (Moving Picture Coding Expert Group) and MPEG-2 are internationally established standard coding methods for compressing video information. In addition, MPEG-4 is now promoted to be standardized a method for coding at a very low bit-rate of not more than 64 kbps.
The standardized video-coding methods adopt a hybrid video coding method performing interframe prediction coding in combination with intraframe prediction coding.
The interframe prediction coding is to generate a predictive video-frame from a reference video-frame and encode a difference of the predictive frame from a current video-frame to reduce an amount of codes to be transmitted. This enables using effective use of transmission lines.
The interframe predictive coding is known as be a block-matching method, an affine transformation method and a warping prediction method. A conventional video-coding and video-decoding device using the affine transformation will be described as follows:
The operation of the conventional video-coding device is first explained below.
It is assumed that a reference video frame usable for producing a predictive video-frame is stored in a frame memory portion in a normal operating condition of the conventional video-coding device for conducting the motion-compensated interframe predictive coding.
An input video frame enters into a subtracting portion and a motion-compensated interframe-predicting portion. The motion-compensated interframe-predicting portion determines a predictive motion from the reference video-frame stored in the frame memory portion and the input video-frame and outputs a predicted video-frame to the subtracting portion.
The subtracting portion subtracts from the input video-frame the predicted video-frame entered by the motion-compensated interframe predicting portion and outputs the subtraction result, i.e., predicted error information to a video coding portion.
The video coding portion spatially transforms the input predicted-error-information, e.g., by using DCT (Discrete Cosine Transform), quantizes the transformed information and outputs coded video-information. At the same time, the coded video-information outputted from the video coding portion is locally decoded by a video-decoding portion and is then transferred to the adder portion.
The adder portion adds the predicted video-information outputted from the video-decoding portion to the predicted error-information outputted from the motion-compensated interframe-predicting portion to form a new reference video-frame which is then transferred to the frame memory portion.
The frame memory portion stores therein the new reference video-frame outputted from the adding portion. This new reference video-frame is given to the motion-compensated interframe predicting portion when encoding a succeeding input video-frame.
The video coding device outputs a series of coded video information (predicted error information) and a series of coded side-information by repeating the above-mentioned sequence of operations.
The operation of the conventional video-decoding device is now described below.
It is assumed that a reference video-frame usable for producing a predictive video-frame is stored in a frame memory portion in a normal operating condition of the conventional video-coding device for conducting the motion-compensated interframe predictive coding. Coded video-information inputted into the video-decoding device enters into a video-decoding portion which decodes the coded-video information by performing the same operations as the video-decoding portion of the video-coding device and outputs the obtained differential video-frame to an adder portion.
Coded side-information inputted into the video-decoding device enters into a motion-compensated interframe predicting portion.
The motion-compensated interframe predicting portion decodes the received coded side-information and obtains motion vectors, then it produces a predicted video-frame by using the obtained motion vectors and a reference video-frame read from the frame memory portion and transfers the produced predicted video-frame to an adder portion.
The adder portion obtains an output video-frame by adding the differential video-frame from the video-decoding portion to the predicted video-frame from the motion-compensated interframe predicting portion. The output video-frame is outputted from the video-decoding device and at the same time is transferred to the frame memory.
The frame memory stores therein the video-frame inputted from the adder portion as a new reference video-frame that will be used by the motion-compensated interframe predicting portion when decoding a proceeding video-frame.
The video decoding device realizes outputting a series of decoded video-frames by repeating the above-mentioned sequential operations.
The operation of a motion-compensated interframe predicting portion in a video-coding device and a video-decoding device and prior arts applied therein will be described below.
An exemplified structure of a motion-compensated interframe predicting portion and its operation will be first described as follows:
The motion-compensated interframe-predicting portion comprises a motion-vector searching portion, a fixed-area predicted-frame generating portion and a motion-vector coding portion.
The motion-vector searching portion searches motion vectors in an input video-frame and a reference video-frame read from a frame memory and outputs the detected motion-vectors to the fixed-area-size predicted-frame generating portion.
When searching motion vectors, the motion-vector searching portion searches a motion vector of a control grid point by previously weighting a center pixel of a processable area and outputs a motion vector of each control grid point to the fixed-area-size predicted-frame generating portion.
According to the conventional method, a processable area has a fixed size with control grid points set one for 16 or 8 pixels.
The fixed-area-size predicted-frame generating portion performs interframe prediction on each processable area having a fixed size (generally in a rectangular block of 16.times.16 pixels) by using motion vectors inputted from the motion vector searching portion and a preceding video-frame read from the frame memory portion.
The interframe prediction is realized by determining affine parameters from positions of three vertex points of each objective triangular area, performing affine transformation on every pixel existing within the triangular area and repeating the same processing on all processable triangular areas to produce a predicted video-frame.
The produced predicted video-frame is transferred to the before-described subtracting portion and adder portion. The motion vectors are transferred to the motion vector coding portion.
The motion vector coding portion encodes the motion vectors received from the fixed-area predicted-frame generating portion and outputs them as coded side-information.
The motion-vector coding device encodes the motion vectors by the following method:
Motion vectors are generally encoded not directly but by predictive method. Namely, a predictive value of each codable objective motion vector is determined and a difference between the predicted value and the motion vector value is encoded. By doing so, the efficiency of coding can be improved with a saved amount of information.
The most simple method of determining a predicted value is such that just coded vector value is adopted as a predicted value of a proceeding motion-vector. According to this method, differences of two successive neighboring motion vectors are encoded in turn. By reason of a high correlation between motion vectors, it is possible to effectively encode motion vectors particularly in the case when neighboring motion vectors have similar values. This prediction coding method is adopted in a motion-vector coding method defined by ITU-T Recommendation H.261.
Another method is to determine a predicted value from a plurality of motion vectors including a just-preceding vector. In this case, a prediction value is determined by using values of three motion-vectors which are located, respectively, left, just above and above right from a coding objective motion-vector. The prediction value may be a mean value or a median of three values of the three neighboring motion-vectors. In comparison with the predicting method using only just-preceding motion vector this method can use a wider range of correlation, i.e., higher correlation of motion vectors, attaining a further improved coding efficiency. The video coding system defined by the ITU-T Recommendation H.263 adopts the prediction method for coding predicted a motion vector by using a median of the three motion-vector values, which is well known to be more effective.
The structure and operation of a motion-compensated interframe-predicting portion of a conventional video-decoding device will now described as follows:
The motion-compensated interframe predicting portion is composed of a fixed-area-size predicted-frame generating portion and a motion-vector coding portion.
Coded side-information inputted into the motion-compensated interframe predicting portion is transferred to the motion-vector coding portion which in turn decodes the received coded side-information, obtains motion vectors and outputs the motion vectors to the fixed-area-size predicted-frame generating portion.
The fixed-area-size predicted-frame generating portion performs interframe prediction processing by using the motion vectors received from the motion vector decoding portion and a reference video-frame received from the frame memory portion.
Interframe prediction is to determine affine parameters from positions of three vertices of an objective triangular area and motion vectors and then to perform affine transformation of all pixels within the triangular area by using the determined affine parameters. This processing is repeated on all objective triangular areas to generate a predicted video-frame.
The fixed-area-size predicted-frame generating portion outputs the obtained predicted video-frame to an adder portion.
Some kinds of methods used for motion-compensated interframe prediction in the above-mentioned video-coding and video-decoding devices will be described below.
A motion vector searching method is as follows:
A so-called "pixel-matching" method that weights generally a center pixel in a processable area is used for determining a motion vector of a control grid point. The processable area is composed of 21 pixels for instance each in X-direction and Y-direction. The pixel matching is such that an area of a reference frame, which matches with the processable area of a current frame is determined by calculation and then motion vectors are determined as displacement values in X-direction and Y-direction. In matching calculation, a difference between the processable area of the current video-frame and the matched area of the reference frame processable is multiplied by a weighting coefficient. By doing so, motion vectors are searched putting a weight on a center pixel in the processable area.
Motion vectors at control grid points located at the periphery of a video-frame are set as follows:
Motion vectors at quadrangular control grid points: PA1 Motion vectors at the top and bottom control grid points: PA1 Motion vectors at the left-side and right-side control-grid points: PA1 (1) a video coding device comprising frame memory means for storing already encoded video signals, motion vector detecting means for determining a representative motion vector per unit-area from a difference between an input video signal and a video signal read from the frame memory means, motion vector interpolating means for determining a motion vector per pixel from the representative motion vector, pixel value predicting means for preparing a predicted video signal from a video signal read from the frame memory means by using the motion vector per pixel, wherein weighting coefficient control means for instructing the vector interpolating means to weight a representative motion vector is provided for determining a weighting coefficient for each representative motion vector determined by the motion vector detecting means and instructing the motion vector interpolating means to weight each representative motion vector; PA1 (2) a video-coding device as defined in item (1), characterized in that the weighting coefficient control means selects one of previously prepared patterns of weighting coefficients for representative vectors and instructs the motion vector interpolating means; PA1 (3) a video-coding device as defined in any one of items (1) and (2), characterized in that the weight coefficient control means determines a vector weighting coefficient according to a direction of each representative motion vector; and PA1 (4) a video-coding device as defined in any one of items (1) and (2), characterized in that the weighting coefficient control means determines a vector weighting coefficient according to a vector value of each representative motion vector. PA1 (5) a video decoding device comprising frame memory means for storing already decoded video signals, vector interpolating means for determining a motion vector per pixel from a representative motion vector inputted per unit area, pixel value predicting means for preparing a predicted video signal from a video signal read from the frame memory means by using the motion vector per pixel, wherein weighting coefficient control means for instructing the vector interpolating means to weight a representative motion vector is provided for determining a weighting coefficient for each representative motion vector instructing the motion vector interpolating means to weight each representative motion vector; PA1 (6) a video-decoding device as defined in item (5), characterized in that the weighting coefficient control means selects one of previously prepared patterns of weighting coefficients for representative vectors and instructs the motion vector interpolating means; and PA1 (7) a video-decoding device as defined in any one of items (5) and (6), characterized in that motion vector converting means is provided for determining a motion vector for a skipped frame from a representative motion vector inputted for a unit area of encoded image frame (screenful) and an interpolating video signal corresponding to an image frame thinned-out without having been encoded is prepared. PA1 (8) A video coding device, which is used for encoding predicted-error information representing a difference between an input video-frame and a predicted video-frame obtained by performing motion-compensated interframe prediction, and whose motion-compensated interframe predicting portion is provided with a variable-area predicted-frame generating portion for dividing a processable area of a video-frame into suitable areas according to motion vectors and a reference video-frame and generating a predicted frame by using affine transformation and an area-dividing pattern deciding portion for controlling dividing of a processable area and outputting a predicted video-frame and side-information such as motion vector information and area-dividing information; PA1 (9) A video-coding device, which is used for encoding predicted-error information representing a difference between an input video-frame and a predicted video-frame obtained by performing motion-compensated interframe prediction, and whose motion-compensated interframe predicting portion is provided with a variable-area predicted-frame generating portion for generating a predicted video-frame by translational displacement of a processable area and dividing a processable area of a video-frame into suitable areas according to motion vectors and a reference video-frame and generating a predicted frame by using affine transformation and an area-dividing pattern deciding portion for controlling dividing of a processable area and outputting a predicted video-frame and side-information such as motion vector information and area dividing information; PA1 (10) A video coding device, which is used for encoding predicted-error information determined as a difference between an input video-frame and an predicted video-frame obtained by executing motion-compensated interframe prediction, and whose motion-compensated interframe-predicting portion is provided with an effective-area selecting portion for selecting a valid processing mask or a invalid processing mask depending upon location of an objective area in a video-frame when searching a motion vector; PA1 (11) A video coding device, which has the same construction as defined in any one of (8) and (9) above and used for encoding predicted-error information determined as a difference between an input video-frame and an predicted video-frame obtained by executing motion-compensated interframe prediction, and whose motion-compensated interframe-predicting portion is further provided with an effective-area selecting portion for selecting a valid processing mask or a invalid processing mask depending upon location of an objective area in a video-frame when searching a motion vector; PA1 (12) A video coding device, which is used for encoding predicted error information determined as a difference between an input video-frame and an predicted video-frame obtained by executing motion-compensated interframe prediction, and whose motion-vector coding portion is provided with a side-information coding portion for encoding an additional motion vector as a difference from a mean of four basic motion-vectors; PA1 (13) A video coding device, which has the same construction as defined in any one of (8) and (9) above and wherein the motion-vector coding portion of the area-dividing pattern deciding portion is further provided with a side-information coding portion for encoding an additional motion vector as a difference from a mean of four basic motion-vectors; PA1 (14) A video coding device, which is used for encoding predicted error information determined as a difference between an input video-frame and an predicted video-frame obtained by executing motion-compensated interframe prediction, wherein a motion-vector coding portion is provided with a side-information coding portion for encoding an additional motion vector positioned between two basic motion-vectors as a difference from a mean of the two basic motion-vectors and encoding a center additional motion-vector as a difference from a mean of four basic motion-vectors; PA1 (15) A video coding device, which has the same construction as defined in any one of (8) and (9) above, wherein the motion-vector coding portion of the area-dividing pattern deciding portion is further provided with a side-information coding portion for encoding an additional motion vector positioned between two basic motion-vectors as a difference from a mean of the two basic motion-vectors and encoding a center additional motion-vector as a difference from a mean of four basic motion-vectors; PA1 (16) A video coding device, which is used for encoding predicted error information determined as a difference between an input video-frame and an predicted video-frame obtained by executing motion-compensated interframe prediction, and whose motion-vector coding portion is provided with a side-information coding portion for encoding an objective motion-vector value by prediction from values of three motion-vectors which are already encoded basic or additional motion-vectors existing at the left-side, the just upper-side and the upper right-side of the object motion vector; PA1 (17) A video coding device, which has the same construction as defined in (16) above, and whose motion-vector coding portion determines a mean of the three motion vectors as a predicted value of the objective motion-vector and encodes a difference between the objective motion-vector and the predicted value; PA1 (18) A video coding device, which has the same construction as defined in (16) above, and whose motion-vector coding portion determines a median of the three motion vectors as a predicted value of the objective motion-vector and encodes a difference between the objective motion-vector and the predicted value; PA1 (19) A video coding device, which has the same construction as defined in any one of (8) and (9) above, and whose area-dividing pattern deciding portion is further provided with the motion-vector coding portion defined in any one of (16), (17) and (18) above; PA1 (20) A video coding device, which has the same construction as defined in any one of (8), (9), (11), (13), (15) and (19) above, and whose motion-compensated interframe-predicting portion is further provided with an area-dividing pattern deciding portion for instructing kinds of dividing patterns of all areas to the variable-area predicted-frame generating portion, adopting an area-dividing pattern minimizing a predicted-error (error information value), outputting motion-vectors and area-dividing information to side-information coding portion and outputting a predicted video-frame; PA1 (21) A video coding device, which has the same construction as defined in any one of (8), (9), (11), (13), (15) and (19) above, and whose motion-compensated interframe-predicting portion is further provided with an area-dividing deciding portion which instructs the variable-area predicted-frame generating portion to divide an area into two small areas as an initial setting and/or make prediction by translational displacement, instructs the variable-area predicted-frame generating portion to more finely divide the area again if a predicted error (error information value) of a predicted-frame generated with the initial setting exceeds a preset threshold value, and which deciding portion outputs motion-vectors and area-dividing information to the side-information coding portion when the prediction error (error information value) became smaller than the preset threshold value, and also outputs a predicted video-frame; PA1 (22) A video decoding device, which has the same construction as defined in any one of (8), (9), (11), (20) and (21) above and is further provided with a subtracting portion whereto the input video frame and the predicted video-frame from the motion-compensated interframe-predicting portion are transferred, a video coding portion for encoding an error (differential) video-frame from the subtracting portion according to an instruction of the coding control portion, a video encoding portion for decoding a coded video frame from the video coding portion, an adder portion for adding a predicted video-frame from the motion-compensated interframe-predicting portion to a decoded video-frame from the video decoding portion and a frame memory for storing a reference video-frame from the adder portion and outputting stored information to the motion-compensated interframe-predicting portion; PA1 (23) A video decoding device, which is used for reproducing a video-frame from a coded side-information inputted from a video-coding device and a reference video-frame inputted from a frame memory, wherein a motion- compensated interframe-predicting portion for generating and outputting a predicted video-frame by changing a codable area-size is provided with a side-information decoding portion for decoding coded side-information from the video coding device and obtaining thereby motion-vectors and area-dividing information and a variable-area predicted video-frame generating portion for generating a predicted video-frame by using motion vectors and area dividing information from the side-information decoding portion and a reference video-frame from the frame memory; PA1 (24) A video decoding device, which is used for reproducing a video-frame from a coded side-information inputted from a video-coding device and a reference video-frame inputted from a frame memory, and which is further provided with a side-information decoding portion for decoding four basic motion-vectors contained in the side-information before decoding motion-vectors and for decoding additional motion-vectors representing a difference from a mean of the basic vectors; PA1 (25) A video decoding device, which has the same construction as defined in (16) above and whose side-information decoding portion decodes four basic motion-vectors contained in the side-information before decoding motion-vectors and for decoding additional motion-vectors representing a difference from a mean of the basic vectors; PA1 (26) A video decoding device, which is used for reproducing a video-frame from a coded side-information inputted from a video-coding device and a reference video-frame inputted from a frame memory, wherein a side-information decoding portion is provided for decoding four basic motion-vectors contained in the side-information before decoding motion-vectors and for decoding an additional motion-vector positioned between two basic motion-vectors as a difference from an obtained mean of the two basic motion-vectors and decoding a center additional motion-vector as a difference from an obtained mean of four basic motion-vectors; PA1 (27) A video decoding device, which has the same construction as defined in (16) above, and wherein a side-information decoding portion decodes four basic motion-vectors contained in the side-information before decoding motion-vectors and decodes an additional motion-vector positioned between two basic motion-vectors as a difference from an obtained mean of the two basic motion-vectors and decoding a center additional motion-vector as a difference from an obtained mean of four basic motion-vectors. PA1 (28) A video decoding device, which is used for reproducing a video-frame from a coded side-information inputted from a video-coding device and a reference video-frame inputted from a frame memory, and which is provided with a side-information decoding portion which decodes a objective motion-vector value by prediction from values of three motion-vectors which are already decoded basic or additional motion-vectors existing at the left-side, the just upper-side and the upper right-side of the object motion vector; PA1 (29) A video decoding device, which has the same construction as defined in (28) above, and whose side-information decoding portion determines a mean of the three motion-vectors as a predicted value of the objective motion-vector and obtains a motion-vector value of the decoded-object by adding the decoded difference value to the predicted value; PA1 (30) A video decoding device, which has the same construction as defined in (28) above, and whose side-information decoding portion determines a median of the three motion vectors as a predicted value of the objective motion-vector and obtains a decoded objective motion-vector value by adding the decoded difference value to the predicted value; PA1 (31) A video decoding device, which as defined in (23), and whose the side-information decoding portion is further provided with the motion-vector decoding system as defined in any one of (28), (29) and (30) above; PA1 (32) A video decoding device, which has the same construction as defined in any one of (23), (25), (27) and (31) above, and which is further provided with a video decoding portion for decoding a coded video-information inputted from a video coding device and outputting the decoded video-information, an adder portion for adding a decoded video-frame inputted from the video decoding portion to a predicted video-frame inputted from the motion-compensated interframe predicting portion and outputting an obtained video-frame as an output video-frame and a memory frame for storing a video-frame from the adder portion and for outputting said stored video-frame as a reference video-frame to the frame inputted from the motion-compensated interframe predicting portion. PA1 (33) a video coding device using adaptive motion-compensated interframe prediction, which comprises a predicting portion for generating a plurality of predictive images of small variable-size areas by applying different predicting methods to each of the small variable-size areas in the process of the motion-compensated interframe prediction and outputting the generated predictive images, an area prediction deciding portion for determining an adaptive area size and an adaptive prediction method according to a plurality of the predictive images received from the predicting portion and outputting side-information composed of area-information, prediction-mode information, motion-vectors and so on, and a side-information coding potion for encoding the side-information outputted from the area prediction deciding portion; PA1 (34) a video coding device using adaptive motion-compensated interframe prediction, which comprises a predicting portion for generating a plurality of predictive images of respective unit areas or further-divided subareas by applying different predicting methods to each of the unit areas or each of the subareas in the process of the motion-compensated interframe prediction and then outputting the the generated predictive images, an area prediction deciding portion for determining an area-information indicating whether the predicted images received from the predicting portion are unit-area images or subarea images and prediction methods applied to the respective predicted images and outputting side-information including area-information, prediction mode information, motion-vectors and the like, and a side-information coding portion for encoding the side-information received from the area prediction deciding portion; PA1 (35) a video coding device using adaptive motion-compensated interframe prediction, which comprises a predicting portion for generating a plurality of predictive images of respective variable-size small areas by using a block-displacement (overlapped motion-compensative) predicting method, an affine transform predicting method, bilinear transform predicting method and a background predicting method, etc. in the process of the motion-compensated interframe prediction and then outputting the generated predictive images, an area prediction deciding portion for determining an adaptive area area-size and an adaptive prediction method from the predictive images received from the predicting portion and outputting side-information including area-information, prediction mode information and motion-vectors, etc., and a side-information coding portion for encoding the side-information received from the area prediction deciding portion; PA1 (36) a video coding device using adaptive motion-compensated interframe prediction, which is provided with a predicting portion for diagonally dividing each encodable unit-area into two or four subareas, producing a predictive image for each subarea by affine transformation in the process of the motion-compensated interframe prediction and outputting the produced predicted images; PA1 (37) a video coding device using adaptive motion-compensated interframe prediction, which has the same functions as defined in any one of items (33), (34) and (35), and is provided with a predicting portion for diagonally dividing each encodable unit-area into two or four subareas, producing a predictive image for each subarea by affine transformation in the process of the motion-compensated interframe prediction and outputting the produced predicted images; PA1 (38) a video decoding device using adaptive motion-compensated interframe prediction, which comprises a side-information decoding portion for decoding coded side-information including area-information, prediction mode information and motion-vectors, etc. and outputting the decoded side-information, a predicting portion for generating a plurality of predictive images of small variable-size areas by applying different predicting methods to each of the small variable-size areas in the process of the motion-compensated interframe prediction and outputting the generated predictive images; and an area-prediction-mode selecting portion for generating an adaptive predictive image from a plurality of the predictive images received from the predicting portion according to the information received from the side-information decoding portion; PA1 (39) a video decoding device using adaptive motion-compensated interframe prediction, which comprises a side-information decoding potion for decoding coded side-information including area-information, prediction mode information and motion-vectors, etc. and outputting the decoded side-information, a predicting portion for generating a plurality of predictive images of respective decodable unit-areas or further-divided subareas by applying different predicting methods to each of the unit areas or each of the subareas in the process of the motion-compensated interframe prediction and then outputting the generated predictive images, and an area-prediction-mode selecting portion for generating an adaptive predictive image from a plurality of the predictive images received from the predicting portion according to the information received from the side-information decoding portion; PA1 (40) a video decoding device using adaptive motion-compensated interframe prediction, which comprises a side-information decoding potion for decoding coded side-information including area-information, prediction mode information and motion-vectors, etc. and outputting the decoded side-information; a predicting portion for generating a plurality of predictive images of respective decodable unit-area or further divided subareas by using a block-displacement (overlapped motion-compensative) predicting method, an affine transform predicting method, bilinear transform predicting method and a background predicting method, etc. in the process of the motion-compensated interframe compensated interframe prediction and then outputting the the generated predictive images, and an area-prediction-mode selecting portion for generating an adaptive predictive image from a plurality of the predictive images received from the predicting portion according to the information received from the side-information decoding portion; PA1 (41) a video decoding device using adaptive motion-compensated interframe prediction, which is provided with a predicting portion for diagonally dividing each decodable unit-area into two or four subareas, producing a predictive image for each subarea by affine transformation in the process of the motion-compensated interframe prediction and outputting the produced predicted images; and PA1 (42) a video decoding device using adaptive motion-compensated interframe prediction, which has the same functions as defined in any one of items (39) and (40) and is provided with a predicting portion for diagonally dividing each decodable unit-area into two or four subareas, producing a predictive image for each subarea by affine transformation in the process of the motion-compensated interframe prediction and then out a predictive image and outputting the produced predicted images.
X- and Y-components of each vector are set both at 0. PA2 An X-component is determined as an X-component of a motion vector detected at a control grid point one inside from an objective control grid point. A Y-component is set at 0. PA2 An X-component of each motion vector is set at 0.
A Y-component is determined as a Y-component of a motion vector detected at a control grid point one inside from an objective control grid point.
The affine transformation is described as follows:
The affine transformation is conducted by representing a map from a video-frame to another video-frame by using six parameters.
The affine transformation is conducted, for the sake of simplifying calculation of affine parameters, usually on a triangular area.
Motion vectors of control grid points A, B, C and D of the current video-frame are detected at corresponding control grid points A', B', C' and D' of a reference video-frame.
Three of four control grid points are first selected and an area is divided to determine affine parameters. For example, the area on the current video-frame is divided into two triangles ABC and BCD on the current video-frame and a corresponding area on the reference video-frame is divided into two angles A'B'C' and B'C'D'.
On the triangles into which the area is divided, affine parameters are determined from vertex positions of each triangle (vertex positions of one triangle and motion vectors of the other may be used).
A predictive video-frame is generated by mapping thereto all pixels of all triangular sub-areas according to the obtained affine parameters.
If a position of any referred pixel in the reference video-frame is not an integer, a predicted value is bilinearly interpolated to determine a pixel value of the predicted video-frame.
A predicted video-frame is generated by performing the above-mentioned processing operations.
The motion-compensated interframe prediction according to the above-mentioned prior art using affine transformation of a fixed area-size has such a drawback that the efficiency of motion prediction is decreased when an object to be mapped has an insufficient size or its periphery is divided by a plurality of areas to create a considerable difference between the object edge and the area dividing lines.
The motion-compensated interframe prediction with the affine transformation may be accompanied by geometrical distortion peculiar to the affine transformation, resulting in that the prediction efficiency is decreased with deterioration of the quality of coded video frames. The motion-compensated interframe prediction may have a considerable decrease of the prediction efficiency if an input video contains a motion of an objective area by a parallel displacement that can not correctly be represented.
High accuracy of setting motion vectors of control-grid points at the peripheral part of a video-frame may be impossible because they are set affinely from respective inside points of the control grid.
In the case of conducting affine transformation of a variable in size area, the number of control grid points must be increased and therefore the number of motion vectors is increased as compared with the conventional affine transformation of a fixed-size area. Accordingly, motion vectors must be encoded at a high efficiency.
In the affine transformation of the variable in size area, the number of control grid points varies with the size of a selected area, i.e., the number of motion vectors varies with the number of the control grid points. Consequently, motion vectors may be arranged dispersively in the video-frame. This makes it difficult to apply the conventional motion-vector coding method as it is. The coding efficiency is also decreased. It is necessary to use a highly efficient motion-vector coding system adapted to variable area-size affine transformation.
Another operation of a motion-compensated interframe predicting portion in a video-coding device and a video-decoding device and methods applied therein will be described below.
An exemplified structure of a motion-compensated interframe predicting portion and its operation will be first described as follows:
The motion-compensated interframe-predicting portion comprises a motion-vector searching portion, a predicted-frame generating portion and a motion-vector coding portion.
The motion-vector searching portion searches motion vectors in an input video-frame and a reference video-frame read from a frame memory portion and outputs the detected motion-vectors to the predicted frame generating portion.
The predicted frame generating portion generates a predictive image by using any one of prediction methods, i.e., block-displacement method, affine transformation method and bilinear transformation method, etc.
In the process of generating a predicted video-frame, side-information such as area-information and motion vectors can be obtained.
The generated predicted video-frame is transferred to the subtracting portion and the adder portion while the side-information is transferred to the side-information coding portion.
The side-information coding portion encodes the side-information received from the predicted-frame generating portion and outputs the coded side-information.
Another structure and another operation of the motion-compensated interframe-predicting portion of a conventional video-decoding device will now described as follows:
The motion-compensated interframe predicting portion is composed of a predicted-frame generating portion and side-information decoding portion.
Coded side-information inputted into the motion-compensated interframe predicting portion is transferred to the side-information decoding portion.
The side-information decoding portion decodes the received coded side-information, obtains decoded side-information including area-information and motion vectors, etc., and outputs the decoded side-information to the predicted-frame generating portion.
The predicted-frame generating portion performs interframe-prediction processing by using the side-information received from the side-information decoding portion and the reference video-frame received from the frame memory portion and outputs the produced predicted video-frame to the adder portion.
The prior art uses a fixed method for motion-compensated interframe prediction. Therefore, a conventional device using affine transformation area-size can not be adapted to the case when the number of codes can be more reduced in total by using the block displacement prediction.
The motion-compensated interframe prediction with the affine transformation may be not always performed with sufficient prediction efficiency depending on shape, size and position of the object to be mapped.