The present invention relates to methods of encoding and decoding a digital picture data for storing or transmitting thereof, more specifically, a method of encoding and decoding the motion information producing predicted pictures, and a method of producing an accurate predicted picture, and an apparatus using the same methods.
Data compression (=encoding) is required for efficient storing and transmitting of a digital picture.
Several methods of encoding are available as prior arts such as xe2x80x9cdiscrete cosine transformxe2x80x9d (DCT) including JPEG and MPEG, and other wave-form encoding methods such as xe2x80x9csubbandxe2x80x9d, xe2x80x9cwaveletxe2x80x9d, xe2x80x9cfractalxe2x80x9d and the like. Further, in order to remove redundant signals between pictures, a prediction method between pictures is employed, and then the differential signal is encoded by wave-form encoding method.
A method of MPEG based on DCT using motion compensation is described here. First, resolve an input picture into macro blocks of 16xc3x9716 pixels. One macro block is further resolved into blocks of 8xc3x978, and the blocks of 8xc3x978 undergo DCT and then are quantized. This process is called xe2x80x9cIntra-frame coding.xe2x80x9d Motion detection means including a block matching method detects a prediction macro block having the least errors on a target macro block from a frame which is time sequentially adjoined. Based on the detected motion, an optimal predicted block is obtained by performing motion compensation of the previous pictures. A signal indicating a predicted macro block having the least errors is a motion vector. Next, a difference between the target block and its corresponding predicted block is found, then the difference undergoes DCT, and the obtained DCT coefficients are quantized, which is transmitted or stored together with motion information. This process is called xe2x80x9cInter-frame coding.xe2x80x9d
At the data receiving side, first, the quantized DCT coefficients are decoded into the original differential signals, next, a predicted block is restored based on the motion vector, then, the differential signal is added to the predicted block, and finally, the picture is reproduced.
A predicted picture is formed in a block by block basis; however, an entire picture sometimes moves by panning or zooming, in this case, the entire picture undergoes motion compensation. The motion compensation or a predicted picture formation involves not only a simple parallel translation but also other deformations such as enlargement, reduction and rotation.
The following equations (1)-(4) express movement and deformation, where (x, y) represents a coordinates of a pixel, and (u, v) represents a transformed coordinates which also expresses a motion vector at (x, y). Other variables are the transformation parameters which indicate a movement or a deformation.
(u, v)=(x+e, y+f)xe2x80x83xe2x80x83(1)
(u, v)=(ax+e, dy+f)xe2x80x83xe2x80x83(2)
(u, v)=(ax+by+e, cx+dy+f)xe2x80x83xe2x80x83(3)
(u, v)=(gx2+pxy+ry2+ax+by+e, hx2+qxy+sy2+cx+dy+f)xe2x80x83xe2x80x83(4)
Equation (3) is so called the Affine transform, and this Affine transform is described here as an example. The parameters of the Affine transform are found through the following steps:
First, resolve a picture into a plurality of blocks, e.g., 2xc3x972, 4xc3x974, 8xc3x978, etc., then find a motion vector of each block through block matching method. Next, select at least three most reliable motion vectors from the detected motion vectors. Substitute these three vectors to equation (3) and solve the six simultaneous equations to find the Affine parameters. In general, errors decrease at the greater number of selected motion vectors, and the Affine parameters are found by the least squares method. The Affine parameters thus obtained are utilized to form a predicted picture. The Affine parameters shall be transmitted to the data receiving side for producing the identical predicted picture.
However, when a conventional inter-frame coding is used, a target picture and a reference picture should be of the same size, and the conventional inter-frame coding method is not well prepared for dealing with pictures of different sizes.
Size variations of adjoining two pictures largely depend on motions of an object in these pictures. For instance, when a person standing with his arms down (FIG. 7A) raises the arms, the size of the rectangle enclosing the person changes (FIG. 7B.) When an encoding efficiency is considered, the target picture and reference picture should be transformed into the same coordinates space in order to decrease a coded quantity of the motion vectors. Also, the arrangement of macro blocks resolved from a picture varies depending on the picture size variation. For instance, when the image changes from FIG. 7A to FIG. 7B, a macro block 701 is resolved into macro blocks 703 and 704, which are subsequently compressed. Due to this compression, a vertical distortion resulting from the quantization appears on the person""s face in the reproduced picture (FIG. 7B), whereby a visual picture quality is degraded.
Because the Affine transform requires high accuracy, the Affine parameters (a, b, c, d, e, f, etc.) are, in general, real numbers having numbers of decimal places. A considerable amount of bits are needed to transmit parameters at high accuracy. In a conventional way, the Affine parameters are quantized, and transmitted as fixed length codes or variable length codes, which lowers the accuracy of the parameters and thus the highly accurate Affine transform cannot be realized. As a result, a desirable predicted picture cannot be produced.
As the equations (1)-(4) express, the number of transformation parameters ranges from 2 to 10 or more. When a transformation parameter is transmitted with a prepared number of bits enough for maximum numbers of parameters, a problem occurs, i.e., redundant bits are to be transmitted.
The present invention aims to, firstly, provide an encoder and a decoder of a digital picture data for transmitting non-integer transformation parameters of long number of digits, such as the Affine transform, at high accuracy for less amount of coded data. In order to achieve the above objective, a predicted picture encoder comprising the following elements is prepared:
(a) picture compression means for encoding an input picture and compressing the data,
(b) coordinates transform means for outputting a coordinates data which is obtained by decoding the compressed data and transforming the decoded data into a coordinates system,
(c) transformation parameter producing means for producing transformation parameters from the coordinates data,
(d) predicted picture producing means for producing a predicted picture from the input picture by the transformation parameters, and
(e) transmission means for transmitting the compressed picture and the coordinates data.
Also a digital picture decoder comprising the following elements is prepared:
(f) variable length decoding means for decoding an input compressed picture data and an input coordinates data,
(g) transformation parameter producing means for producing transformation parameters from the decoded coordinates data,
(h) predicted picture producing means for producing a predicted picture data using the transformation parameters,
(i) addition means for producing a decoded picture by adding the predicted picture and the compressed picture data.
To be more specific, the transformation parameter producing means of the above digital encoder and decoder produces the transformation parameters using xe2x80x9cNxe2x80x9d (a natural number) pieces of pixels coordinates-points and the corresponding xe2x80x9cNxe2x80x9d pieces of transformed coordinates-point obtained by applying a predetermined linear polynomial function to the N pieces of pixels coordinates-points. Further, the transformation parameter producing means of the above digital encoder and decoder outputs transformation parameters produced through the following steps: first, input target pictures having different sizes and numbered xe2x80x9c1xe2x80x9d through xe2x80x9cNxe2x80x9d, second, set a common spatial coordinates for the above target pictures, third, compress the target pictures to produce compressed pictures thereof, then, decode the compressed pictures and transform them into the common spatial coordinates, next, produce expanded (decompressed) pictures thereof and store them, and at the same time, transform the expanded pictures into the common spatial coordinates.
The present invention aims to, secondly, provide a digital picture encoder and decoder. To be more specific, when pictures of different sizes are encoded to form a predicted picture, the target picture and reference picture are transformed into the same coordinates space, and the coordinates data thereof is transmitted, thereby increasing accuracy of detecting a motion and at the same time, reducing the amount of coded quantity for improving picture quality.
In order to achieve the above objective, the predicted picture encoder according to the present invention performs the following steps: first, input target pictures having different sizes and numbered xe2x80x9c1xe2x80x9d through xe2x80x9cNxe2x80x9d, second, set a common space coordinates for the above target pictures, third, compress the target pictures to produce compressed pictures thereof, then, decode the compressed pictures and transform them into the common spatial coordinates, next, produce expanded pictures thereof and store them, and at the same time, transform the expanded pictures into the common spatial coordinates, thereby producing a first off-set signal (coordinates data), then encode this off-set signal, and transmit it together with the first compressed picture.
The predicted picture encoder according to the present invention further performs the following steps with regard to the xe2x80x9cnxe2x80x9d th (n=2, 3, . . . N) target picture after the above steps: first, transform the target picture into the common spatial coordinates, second, produce a predicted picture by referring to an expanded picture of the (n-1)th picture, third, produce a differential picture between the xe2x80x9cnxe2x80x9d th target picture and the predicted picture, and then compress it to encode, thereby forming the xe2x80x9cnxe2x80x9d th compressed picture, then, decode the xe2x80x9cnxe2x80x9d th compressed picture, next, transform it into the common spatial coordinates to produce the xe2x80x9cnxe2x80x9d th expanded picture, and store it, at the same time, encode the xe2x80x9cnxe2x80x9d th off-set signal (coordinates data) which is produced by transformation the xe2x80x9cnxe2x80x9d th target picture into the common space coordinates, finally transmit it together with the xe2x80x9cnxe2x80x9d th compressed picture.
The predicted picture decoder of the present invention comprises the following elements: input terminal, data analyzer (parser), decoder, adder, coordinates transformer, motion compensator and frame memory. The predicted picture decoder of the present invention performs the following steps: first, input compressed picture data to the input terminal, the compressed picture data being numbered from 1 to N including the xe2x80x9cnxe2x80x9d th off-set signal which is produced by encoding the target pictures having respective different sizes and being numbered 1 to N, and transforming the xe2x80x9cnxe2x80x9d th (n=1,2, 3, . . . N) target picture into the common spatial coordinates, second, analyze the first compressed picture data, and output the first compressed picture signal together with the first off-set signal, then input the first compressed picture signal to the decoder to decode it to the first reproduced picture, and then, the first reproduced picture undergoes the coordinates transformer using the first off-set signal, and store the transformed first reproduced picture in the frame memory. With regard to the xe2x80x9cnxe2x80x9d th (n=2, 3, 4, . . . N) compressed picture data, first, analyze the xe2x80x9cnxe2x80x9d th compressed picture data in the data analyzer, second, output the xe2x80x9cnxe2x80x9d th compressed picture signal, the xe2x80x9cnxe2x80x9d th off-set signal and the xe2x80x9cnxe2x80x9d th motion signal, third, input the xe2x80x9cnxe2x80x9d th compressed picture signal to the decoder to decode it into the xe2x80x9cnxe2x80x9d th expanded differential picture, next, input the xe2x80x9cnxe2x80x9d th off-set signal and xe2x80x9cnxe2x80x9d th motion signal to the motion compensator, then, obtain the xe2x80x9cnxe2x80x9d th predicted picture from the xe2x80x9cnxe2x88x921xe2x80x9d th reproduced picture stored in the frame memory based on the xe2x80x9cnxe2x80x9d th off-set signal and xe2x80x9cnxe2x80x9d th motion signal, after that, in the adder, add the xe2x80x9cnxe2x80x9d th expanded differential picture to the xe2x80x9cnxe2x80x9d th predicted picture to restore then into the xe2x80x9cnxe2x80x9d th reproduced picture, and at the same time, the xe2x80x9cnxe2x80x9d th reproduced picture undergoes the coordinates transformer based on the xe2x80x9cnxe2x80x9d th off-set signal and is stored in the frame memory.
The present invention aims to, thirdly, provide a digital picture encoder and decoder which can accurately transmit the coordinates data including the transformation parameters having the Affine parameter for the Affine transform, and can produce an accurate predicted picture.
A digital picture decoder according to the present invention comprises the following elements: variable length decoder, differential picture expander, adder, transformation parameter generator, predicted picture generator and frame memory.
The above digital picture decoder performs the following steps: first, input data to the variable length decoder, second, separate a differential picture data and transmit it to the differential picture expander, at the same time, separate the coordinates data and send it to the transformation parameter generator, thirdly, in the differential picture expander, expand differential picture data, and transmit it to the adder, next, in the transformation parameter generator, produce the transformation parameters from the coordinates data, and transmit it to the predicted picture generator, then, in the predicted picture generator, produce the predicted picture using the transformation parameters and the picture input from the frame memory, and transmit the predicted picture to the adder, where the predicted picture is added to the expanded differential picture, finally, produce the picture to output, at the same time, store the picture in the frame memory.
The above coordinates data represent either one of the following cases:
(a) the coordinates points of N pieces of pixels and the corresponding N pieces of transformed coordinates points obtained by applying the predetermined linear polynomial function to the coordinates points of N pieces of pixels, or
(b) a differential value between the coordinates points of N pieces of pixels and the corresponding N pieces of transformed coordinates points obtained by applying the predetermined linear polynomial to the coordinates points of the N pieces of pixels, or
(c) N pieces of transformed coordinates points obtained by applying a predetermined linear polynomial to predetermined N pieces for each of the coordinates points, or
(d) differential values between the N pieces of transformed coordinates points obtained by applying the predetermined linear polynomial function to predetermined N pieces of coordinates point and predicted values. These predicted values represent the predetermined N pieces coordinates points, or N pieces transformed coordinates points of the previous frame.
A digital picture encoder according to the present invention comprises the following elements: transformation parameter estimator, predicted picture generator, first adder, differential picture compressor, differential picture expander, second adder, frame memory and transmitter.
The above digital picture encoder performs the following steps: first, input a digital picture, second, in the transformation parameter estimator, estimate each of the transformation parameters using the picture stored in the frame memory and the digital picture, third, input the estimated transformation parameters together with the picture stored in the frame memory to the predicted picture generator, next, produce a predicted picture based on the estimated transformation parameters, then in the first adder, find a difference between the digital picture and the predicted picture, after that, in the differential picture compressor, compress the difference into compressed differential data, then transmit the data to the transmitter, at the same time, in the differential picture expander, expand the compressed differential data into an expanded differential data, then, in the second adder, the predicted picture is added to the expanded differential data, next, store the added result in the frame memory. To be more specific, the coordinates data is transmitted from the transformation parameter estimator to the transmitter, and they are transmitted together with the compressed differential data.
The above coordinates data comprises either one of the following cases:
(a) the coordinates points of N pieces of pixels and the corresponding N pieces of transformed coordinates points obtained by applying transformation using the transformation parameters, or
(b) the coordinates points of N pieces of pixels as well as each of the differential values between the coordinates points of N pieces of pixels and the N pieces of transformed coordinates points, or
(c) N pieces of coordinates points transformed from each of the predetermined N pieces coordinates points of pixels, or
(d) each of the differential values between the N pieces of coordinates points transformed from the predetermined N pieces coordinates points of pixels, or
(e) each of the differential values between N pieces transformed coordinates points and those of a previous frame.
A digital picture decoder according to the present invention comprises the following elements: variable length decoder, differential picture expander, adder, transformation parameter generator, predicted picture generator and frame memory.
The above digital picture decoder performs the following steps: first, input data to the variable length decoder, second, separate a differential picture data and transmit it to the differential picture expander, at the same time, input the number of coordinates data together with the coordinates data to the transformation parameter generator, thirdly, in the differential picture expander, expand differential picture data, and transmit it to the adder, next, in the transformation parameter generator, change transformation parameter generation methods depending on the number of the transformation parameters, then, produce the transformation parameters from the coordinates data, and transmit it to the predicted picture generator, then, in the predicted picture generator, produce the predicted picture using the transformation parameters and the picture input from the frame memory, and transmit the predicted picture to the adder, where the predicted picture is added to the expanded differential picture, finally, produce the picture to output, at the same, store the picture in the frame memory.
The above coordinates data represent either one of the following cases:
(a) the coordinates points of N pieces of pixels and the corresponding N pieces of transformed coordinates points obtained by transforming the coordinates points of N pieces of pixels by using the predetermined linear polynomial function, or
(b) the coordinates points of N pieces of pixels and each of the differential values between the coordinates points of N pieces of pixels and the corresponding N pieces of transformed coordinates points obtained by transforming the coordinates points of N pieces of pixels by using the predetermined linear polynomial function, or
(c) the N pieces of coordinates points transformed from the predetermined N pieces of coordinates points by the predetermined linear polynomial, or
(d) differential values between the coordinates points of N pixels and the coordinates points of N pieces of pixels of the previous frame, and differential values of the N pieces of transformed coordinates points obtained by the predetermined linear polynomial and the N pieces transformed coordinates points in the previous frame, or
(e) N pieces of coordinates points transformed from the predetermined N pieces coordinates points by the predetermined linear polynomial, or
(f) differential values between the N pieces of coordinates points transformed from the predetermined N pieces of coordinates points by the predetermined linear polynomial and the predetermined N pieces coordinates points, or
(g) differential values between the N pieces of coordinates points transformed from the predetermined N pieces coordinates points by the predetermined linear polynomial and those in the previous frame.
When the transformation parameters are transmitted, the transformation parameters are multiplied by the picture size, and then quantized before the transformation parameter is encoded, or an exponent of the maximum value of transformation parameter is found, and the parameters are normalized by the exponent, then the normalized transformation parameters together with the exponent are transmitted.