The present invention relates to a picture encoding and decoding technique, a picture processing technique, a recording technique, and a recording medium and, more particularly, to such techniques and recording medium for use in recording moving picture data onto a recording medium, such as a magneto-optical disc or a magnetic tape, reproducing the recorded data for display on a display system, or transmitting the moving picture data over a transmission channel from a transmitter to a receiver and receiving and displaying the transmitted data by the receiver or editing the received data for recording, as in a teleconferencing system, video telephone system, broadcast equipment, or in a multi-media database retrieving system.
In a system for transmitting moving picture data to a remote place, as in a teleconferencing system or video telephone system, picture data may be encoded (compressed) by exploiting or utilizing line correlation and inter-frame correlation. A high-efficiency encoding system for moving pictures has been proposed by the Moving Picture Experts Group (MPEG). Such system has been proposed as a standard draft after discussions in ISO-1EC/JTC1/SC2/WG11, and is a hybrid system combined from the motion compensation predictive coding and discrete cosine transform (DCT).
In MPEG, several profiles and levels are defined for coping with various types of applications and functions. The most basic is the main profile main level (MOVING PICTURE ML (Main Profile@ at main Level)).
FIG. 1 illustrates a MP@ ML encoding unit in an MPEG system. In such encoding unit, picture data to be encoded is supplied to a frame memory 31 for transient storage therein. A motion vector detector 32 reads out picture data stored in the fame memory 31 in terms of a 16xc3x9716 pixel macro-block basis so as to detect its motion vector. The motion vector detector 32 processes picture data of each frame as an I-picture, a P-picture, or as a B-picture. Each of the pictures of the sequentially entered frames is processed as one of the I-, P- or B-pictures as a pre-set manner, such as in a sequence of I, B, P, B, P, . . . , B, P. That is, the motion vector detector 32 refers to a pre-set reference frame in a series of pictures stored in the frame memory 31 and detects the motion vector of a macro-block, that is, a small block of 16 pixels by 16 lines of the frame being encoded by pattern matching (block matching) between the macro-block and the reference frame for detecting the motion vector of the macro-block.
In MPEG, there are four picture prediction modes, that is, an intra-coding (intra-frame coding), a forward predictive coding, a backward predictive coding, and a bidirectional predictive-coding. An I-picture is an intra-coded picture, a P-picture is an intra-coded or forward predictive coded or backward predictive coded picture, and a B-picture is an intra-coded, a forward predictive coded, or a bidirectional predictive-coded picture.
Returning to FIG. 1, the motion vector detector 32 performs forward prediction on a P-picture to detect its motion vector. The motion vector detector 32 compares prediction error produced by performing forward prediction to, for example, the variance of the macro-block being encoded (macro-block of the P-picture). If the variance of the macro-block is smaller than the prediction error, the intra-coding mode is set as the prediction mode and outputted to a variable length coding (VLC) unit 36 and to a motion compensator 42. On the other hand, if the prediction error generated by the forward prediction coding is smaller, the motion vector detector 32 sets the forward predictive coding mode as the prediction mode and outputs the set mode to the VLC unit 36 and the motion compensator 42 along with the detected motion vector. Additionally, the motion vector detector 32 performs forward prediction, backward prediction, and bi-directional prediction for a B-picture to detect the respective motion vectors. The motion vector detector 32 detects the smallest prediction error of forward prediction, backward prediction, and bidirectional prediction (referred to herein as minimum prediction error) and compares the minimum prediction error), for example, the variance of the macro-block being encoded (macro-block of the B-picture). If, as a result of such comparison, the variance of the macro-block is smaller than the minimum prediction error, the motion vector detector 32 sets the intra-coding mode as the prediction mode, and outputs the set mode to the VLC unit 36 and the motion compensator 42. If, on the other hand, the minimum prediction error is smaller, the motion vector detector 32 sets the prediction mode for which the minimum prediction error has been obtained, and outputs the prediction mode thus set to the VLC unit 36 and the motion compensator 42 along with the associated motion vector.
Upon receiving the prediction mode and the motion vector from the motion vector detector 32, the motion compensator 42 may read out encoded and already locally decoded picture data stored in the frame memory 41 in accordance with the prediction mode and the motion vector and may supply the read-out data as a prediction picture to arithmetic units 33 and 40. The arithmetic unit 33 also receives the same macro-block as the picture data read out by the motion vector detector 32 from the frame memory 31 and calculates the difference between the macro-block and the prediction picture from the motion compensator 42. Such difference value is supplies to a discrete cosine transform (DCT) unit 34.
If only the prediction mode is received from the motion vector detector 32, that is, if the prediction mode is the intra-coding mode, the motion compensator 42 may not output a prediction picture. In such situation, the arithmetic unit 33 may not perform the above-described processing, but instead may directly output the macro-block read out from the frame memory 31 to the DCT unit 34. Also, in such situation, the arithmetic unit 40 may perform in a similar manner.
The DCT unit 34 performs DCT processing on the output signal from the arithmetic unit 33 so as to obtain DCT coefficients which are supplied to a quantizer 35. The quantizer 35 sets a quantization step (quantization scale) in accordance with the data storage quantity in a buffer 37 (data volume stored in the buffer 37) received as a buffer feedback and quantizes the DCT coefficients from the DCT unit 34 using the quantization step. The quantized DCT coefficients (sometimes referred to herein as quantization coefficients) are supplied to the VLC unit 36 along with the set quantization step.
The VLC unit 36 converts the quantization coefficients supplied from the quantizer 35 into a variable length code, such a Huffman code, in accordance with the quantization step supplied from the quantizer 35. The resulting converted quantization coefficients are outputted to the buffer 37. The VLC unit 36 also variable length encodes the quantization step from the quantizer 35, prediction mode from the motion vector detector 32, and the motion vector from the motion vector detector 32, and outputs the encoded data to the buffer 37. It should be noted that the prediction mode is a mode specifying which of the intra-coding, forward predictive coding, backward predictive coding, or bidirectionally predictive coding has been set.
The buffer 37 transiently stores data from the VLC unit 36 and smooths out the data volume so as to enable smoothed data to be outputted therefrom and supplied to a transmission channel or to be recorded on a recording medium or the like. The buffer 37 may also supply the stored data volume to the quantizer 35 which sets the quantization step in accordance therewith. As such, in the case of impending overflow of the buffer 37, the quantizer 35 increases the quantization step size so as to decrease the data volume of the quantization coefficients. Conversely, in the case of impending underflow of the buffer 37, the quantizer 35 decreases the quantization step size so as to increase the data volume of the quantization coefficients. As is to be appreciated, this procedure may prevent overflow and underflow of the buffer 37.
The quantization coefficients and the quantization step outputted by the quantizer 35 are supplied not only to the VLC unit 36, but also to a dequantizer 38 which dequantizes the quantization coefficients in accordance with the quantization step so as to convert the same to DCT coefficients. Such DCT coefficients are supplied to an IDCT (inverse DCT) unit 39 which performs inverse DCT on the DCT coefficients. The obtained inverse DCTed coefficients are supplied to the arithmetic unit 40.
The arithmetic unit 40 receives the inverse DCT coefficients from the IDCT unit 39 and data from the motion compensator 42 which are the same as the prediction picture sent to the arithmetic unit 33. The arithmetic unit 40 sums the signal (prediction residuals) from the IDCT unit 39 to the prediction picture from the motion compensator 42 to locally decode the original picture. However, if the prediction mode indicates intra-coding, the output of the IDCT unit 39 may be fed directly to the frame memory 41. The decoded picture (locally decoded picture) obtained by the arithmetic unit 40 is sent to and stored in the frame memory 41 so as to be used later as a reference picture for an inter-coded picture, forward predictive coded picture, backward predictive code picture, or a bidirectional predictive code picture.
The decoded picture obtained from the arithmetic unit 40 is the same as that which may be obtained from a receiver or decoding unit (not shown in FIG. 1).
FIG. 2 illustrates a MP@ ML decoder in an MPEG system for decoding encoded data such as that outputted by the encoder of FIG. 1. In such decoder, encoded data transmitted via a transmission path may be received by a receiver (not shown) or encoded data recorded on a recording medium may be reproduced by a reproducing device (not shown) and supplied to a buffer 101 and stored thereat. An IVLC unit (inverse VLC unit) 102 reads out encoded data stored in the buffer 101 and variable length decodes the same so as to separate the encoded data into a motion vector, prediction mode, quantization step and quantization coefficients. Of these, the motion vector and the prediction mode are supplied to a motion compensator 107, while the quantization step and quantization coefficients are supplied to a dequantizer 103. The dequantizer 103 dequantizes the quantization coefficients in accordance with the quantization step so as to obtain DCT coefficients which are supplied to an IDCT (inverse DCT) unit 104. The IDCT unit 104 performs an inverse DCT operation on the received DCT coefficients and supplies the resulting signal to an arithmetic unit 105. In addition to the output of the IDCT unit 104, the arithmetic unit 105 also receives an output from a motion compensator 107. That is, the motion compensator 107 reads out a previously decoded picture stored in a frame memory 106 in accordance with the prediction mode and the motion vector from the IVLC unit 102 in a manner similar to that of the motion compensator 42 of FIG. 1 and supplies the read-out decoded picture as a prediction picture to the arithmetic unit 105. The arithmetic unit 105 sums the signal from the IDCT unit 104 (prediction residuals) to the prediction picture from the motion compensator 107 so as to decode the original picture. If the output of the IDCT unit 104 is intra-coded, such output may be directly supplied to and stored in the frame memory 106. The decoded picture stored in the frame memory 106 may be used as a reference picture for subsequently decoded pictures, and also may be read out and supplied to a display (not shown) so as to be displayed thereon. However, if the decoded picture is a B-picture, such B-picture is not stored in the frame memories 41 (FIG. 1) or 106 (FIG. 2) in the encoding unit or decoder, since a B-picture is not used as a reference picture in MPEG1 and MPEG2.
In MPEG, a variety of profiles and levels as well as a variety of tools are defined in addition to the above-described MP@ML. An example of a MPEG tool is scalability. More specifically, MPEG adopts a scalable encoding system for coping with different picture sizes or different frame sizes. In spatial scalability, if only a lower-layer bitstream is decoded, for example, only a picture with a small picture size is obtained, whereas, if both lower-layer and upper-layer bitstreams are decoded, a picture with a large picture size is obtained.
FIG. 3 illustrates an encoding unit for providing spatial scalability. In spatial scalability, the lower and upper layers are associated with picture signals of a small picture size and those with a large picture size, respectively. The upper-layer encoding unit 201 may receive an upper-layer picture for encoding, whereas, the lower-layer encoding unit 202 may receive a picture resulting from a thinning out process for reducing the number of pixels (hence a picture lowered in resolution for diminishing its size) as a lower-layer picture. The lower-layer encoding unit 202 predictively encodes a lower-layer picture in a manner similar to that of FIG. 1 so as to form and output a lower-layer bitstream. The lower-layer encoding unit 202 also generates a picture corresponding to the locally decoded lower-layer picture enlarged to the same size as the upper-layer picture size (occasionally referred to herein as an enlarged picture). This enlarged picture is supplied to the upper-layer encoding unit 201. The upper-layer encoding unit 201 predictively encodes an upper-layer picture in a manner similar to that of FIG. 1 so as to form and output an upper-layer bitstream. The upper layer encoding unit 201 also uses the enlarged picture received from the lower-layer encoding unit 202 as a reference picture for executing predictive coding. The upper layer bitstream and the lower layer bitstream are multiplexed to form encoded data which is outputted.
FIG. 4 illustrates an example of the lower layer encoding unit 202 of FIG. 3. Such lower layer encoding unit 202 is similarly constructed to the encoder of FIG. 1 except for an upsampling unit 211. Accordingly, in FIG. 4, parts or components corresponding to those shown in FIG. 1 are depicted by the same reference numerals. The upsampling unit 211 upsamples (interpolates) a locally decoded lower-layer picture outputted by the arithmetic unit 40 so as to enlarge the picture to the same size as the upper layer picture size and supplies the resulting enlarged picture to the upper layer encoding unit 201.
FIG. 5 illustrates an example of the upper layer encoding unit 201 of FIG. 3. Such upper layer encoding unit 201 is similarly constructed to the encoder of FIG. 1 except for weighing addition units 221, 222 and an arithmetic unit 223. Accordingly, in FIG. 5, parts. or components corresponding to those of FIG. 1 are denoted by the same reference numerals. The weighing addition unit 221 multiplies a prediction picture outputted by the motion compensator 42 by a weight W and outputs the resulting signal to the arithmetic unit 223. The weighing addition unit 222 multiplies the enlarged picture supplied from the lower layer encoding unit 202 with a weight (1xe2x88x92W) and supplies the resulting product to the arithmetic unit 223. The arithmetic unit 223 sums the received outputs from the weight addition circuits 221, 222 and outputs the resulting sum to the arithmetic units 33, 40 as a predicted picture. The weighing W used in the weighing addition unit 221 is pre-set, as is the weighing (1xe2x88x92W) used in the weighing addition unit 222. The weighing W is supplied to the VLC unit 36 for variable length encoding. The upper layer encoding unit 201 performs processing similar to that of FIG. 1.
Thus the upper layer encoding unit 201 performs predictive encoding using not only the upper layer picture, but also the enlarged picture from the lower layer encoding unit 202, that is, a lower layer picture, as a reference picture.
FIG. 6 illustrates an example of a decoder for implementing spatial scalability. Output encoded data from the encoder of FIG. 3 is separated into an upper layer bitstream and a lower layer bitstream which are supplied to an upper layer decoding unit 231 and to a lower layer decoding unit 232, respectively. The lower layer decoding unit 232 decodes the lower layer bitstream as in FIG. 2 and outputs the resulting decoded picture of the lower layer. In addition, the lower layer decoding unit 232 enlarges the lower layer decoded picture to the same size as the upper layer picture to generate an enlarged picture and supplies the same to the upper layer decoding unit 231. The upper layer decoding unit 231 similarly decodes the upper layer bitstream, as in FIG. 2. However, the upper layer decoding unit 231 decodes the bitstream using the enlarged picture from the lower layer decoding unit 232 as a reference picture.
FIG. 7 illustrates an example of the lower layer decoding unit 232. The lower layer decoding unit 232 is similarly constructed to the decoder of FIG. 2 except for an upsampling unit 241. Accordingly, in FIG. 7, parts or components corresponding to those of FIG. 2 are depicted by the same reference numerals. The upsampling unit 241 upsamples (interpolates) the decoded lower layer picture outputted by the arithmetic unit 105 SO as to enlarge the lower layer picture to the same size as the upper layer picture size and outputs the enlarged picture to the upper layer decoder 231.
FIG. 8 illustrates an example of the upper layer decoding unit 231 of FIG. 6. The upper layer decoding unit 231 is similarly constructed to the encoder of FIG. 2 except for weighing addition units 251, 252 and an arithmetic unit 253. Accordingly, in FIG. 7, parts. or components corresponding to those of FIG. 2 are depicted by the same reference numerals. In addition to performing the processing explained with reference to FIG. 2, the IVLC unit 102 extracts the weighing W from the encoded data and outputs the extracted weighing W to the weighing addition units 251, 252. The weighing addition unit 251 multiplies the prediction picture outputted by the motion compensator 107 by the weighing W and outputs the resulting product to the arithmetic unit 253. The arithmetic unit 253 also receives an output from the weighing addition unit 252. Such output is obtained by multiplying the enlarged picture supplied from the lower layer decoding unit 232 by the weighing (1xe2x88x92W). The arithmetic unit 253 sums the outputs of the weighing summing units 251, 252 and supplies the summed output as a prediction picture to the arithmetic unit 105. Therefore, the arithmetic unit 253 uses the upper layer picture and the enlarged picture from the lower layer encoding unit 232, that is, the lower layer picture, as reference pictures, for decoding. Such processing is performed on both luminance signals and chroma signals. The motion vector for the chroma signals may be one-half as large as the motion vector for the luminance signals.
In addition to the above-described MPEG system, a variety of high-efficiency encoding systems have been standardized for moving pictures. In ITU-T, for example, systems such as H.261 or H.263 have been prescribed mainly as encoding systems for communication. Similar to the MPEG system, these H.261 and H.263 systems basically involve a combination of motion compensation prediction encoding and DCT encoding. Specifically, the H.261 and H.263 systems may be basically similar in structure to the encoder or the decoder of the MPEG system, although differences in the structure thereof or in the details such as header information may exist.
In a picture synthesis system for constituting a picture by synthesizing plural pictures, a so-called chroma key technique may be used. This technique photographs an object in front of a background of a specified uniform color, such as blue, extracts an area other than the blue therefrom, and synthesizes the extracted area to another picture. The signal specifying the extracted area is termed a key signal.
FIG. 9 illustrates a method for synthesizing a picture where F1 is a background picture and F2 is a foreground picture. The picture F2 is obtained by photographing an object, herein a person, and extracting an area other than this color. The chroma signal K1 specifies the extracted area. In the picture synthesis system, the background picture F1 and the foreground picture F2 are synthesized in accordance with the key signal K1 to generate a synthesized picture F3. This synthesized picture is encoded, such as by a MPEG technique, and transmitted.
If the synthesized picture F3 is encoded and transmitted as described above, only the encoded data on the synthesized picture F3 is transmitted, so that the information such as the key signal K1 may be lost. As such, picture re-editing or re-synthesis for keeping the foreground F2 intact and changing only the background F1 becomes difficult to perform on the receiving side.
Consider a method in which the pictures F1, F2 and the key signals K1 are separately encoded and the resulting respective bitstreams are multiplexed as shown, for example, in FIG. 10. In such case, the receiving side demultiplexes the multiplexed data to decode the respective bitstreams and produce the pictures F1, F2 or the key signal K1. The decoded results of the pictures F1, F2 or the key signal K1 may be synthesized so as to generate the synthesized picture F3. In such case, the receiving side may perform picture re-editing or re-synthesis such that the foreground F2 is kept intact and only the background F1 is changed.
Therefore, the synthesized picture F3 is made up of the pictures F1 and F2. In a similar manner, any picture may be thought of as being made up of plural pictures or objects. If units that go to make up a picture are termed video objects (VOs), an operation for standardizing a VO based encoding system is underway in ISO-IEC/JTC1/SC29/WG11 as MPEG 4. However, at present, a method for efficiently encoding a VO or encoding key signals has not yet been established and is in a pending state. In any event, although MPEG 4 prescribes the function of scalability, there has not been proposed a specified technique for realization of scalability for a VO in which the position and size thereof change with time. As an example, if the VO is a person approaching from a distant place, the position and the size of the VO change with time. Therefore, if a picture of a lower layer is used as a reference picture in predictive encoding of the upper layer picture, it may be necessary to clarify the relative position between the picture of the upper layer and the lower layer picture used as a reference picture. On the other hand, in using VO-based scalability, the condition for a skip macro-block of the lower layer is not necessarily directly applicable to that for a skip macro-block of the lower layer.
It is therefore an object of the present invention to provide a technique which enables VO-based encoding to be easily achieved.
In accordance with an aspect of the present invention, a picture encoding device is provided which includes enlarging/contracting means for enlarging or contracting a second picture based on the difference in resolution between first and second pictures (such as a resolution converter 24 shown in FIG. 15), first picture encoding means for predictive coding the first picture using an output of the enlarging/contracting means as a reference picture (such as an upper layer encoding unit 23 shown in FIG. 15), second picture encoding means for encoding the second picture (such as a lower layer encoding unit 25), position setting means for setting the positions of the first picture and the second picture in a pre-set absolute coordinate system and outputting first or second position information on the position of the first or second picture, respectively (such as a picture layering unit 21 shown in FIG. 15), and multiplexing means for multiplexing outputs of the first picture encoding means, second picture encoding means, and the position setting means (such as a multiplexer 26 shown in FIG. 15). The first picture encoding means recognizes the position of the first picture based on the first position information and converts the second position information in response to an enlarging ratio or a contracting ratio by which the enlarging/contracting means has enlarged or contracted the second picture. The first picture encoding means also recognizes the position corresponding to the results of conversion as the position of the reference picture in order to perform predictive coding.
In accordance with another aspect of the present invention, a picture encoding device for encoding is provided which includes enlarging/contracting means for enlarging or contracting a second picture based on the difference in resolution between first and second pictures (such as the resolution converter 24 shown in FIG. 15), first picture encoding means for predictive coding the first picture using an output of the enlarging/contracting means as a reference picture (such as the upper layer encoding unit 23 shown in FIG. 15), second picture encoding means for encoding the second picture (such as the lower layer encoding unit 25), position setting means for setting the positions of the first picture and the second picture in a pre-set absolute coordinate system and outputting first or second position information on the position of the first or second picture, respectively (such as the picture layering unit 21 shown in FIG. 15), and multiplexing means for multiplexing outputs of the first picture encoding means, second picture encoding means, and the position setting means (such as the multiplexer 26 shown in FIG. 15). The first picture encoding means is caused to recognize the position of the first picture based on the first position information and to convert the second position information in response to an enlarging ratio or a contracting ratio by which the enlarging/contracting means has enlarged or contracted the second picture. The first picture encoding means recognizes the position corresponding to the results of conversion as the position of the reference picture in order to perform predictive coding.
In accordance with the above picture encoding device and a picture encoding method, the enlarging/contracting means enlarges or contracts the second picture based on the difference in resolution between the first and second pictures, while the first picture encoding means predictively encodes the first picture using an output of the enlarging/contracting means as a reference picture. The position setting means sets the positions of the first picture and the second picture in a pre-set absolute coordinate system and outputs the first position information or the second position information on the position of the first or second picture, respectively. The first picture encoding means recognizes the position of the first picture, based on the first position information, and converts the second position information responsive to an enlarging ratio or a contracting ratio by which the enlarging/contracting means has enlarged or contracted the second picture. The first picture encoding means recognizes the position corresponding to the results of conversion as the position of the reference picture in order to perform predictive coding.
In accordance with another aspect of the present invention, a picture decoding device is provided which includes second picture decoding means for decoding a second picture (such as a lower layer decoding unit 95), enlarging/contracting means for enlarging/contracting the second picture decoded by the second picture decoding means based on the difference in resolution between first and second pictures (such as a resolution converter 94 shown in FIG. 29), and first picture decoding means for decoding the first picture using an output of the enlarging/contracting means as a reference picture (such as an upper layer decoding unit 93 shown in FIG. 29). The encoded data includes first or second position information on the position of the first and second picture, respectively, in a pre-set absolute coordinate system. The first picture decoding means recognizes the position of the first picture based on the first position information and converts the second position information in response to an enlarging ratio or a contracting ratio by which the enlarging/contracting means has enlarged or contracted the second picture. The first picture decoding means also recognizes the position corresponding to the results of conversion as the position of the reference picture in order to decode the first picture.
The above picture decoding device may include a display for displaying decoding results of the first picture decoding means (such as a monitor 74 shown in FIG. 27).
In accordance with another aspect of the present invention, a picture decoding device is provided which includes second picture decoding means for decoding a second picture (such as a lower layer decoding unit 95 shown in FIG. 29), enlarging/contracting means for enlarging/contracting the second picture decoded by the second picture decoding means based on the difference in resolution between first and second pictures (such as a resolution converter 94 shown in FIG. 29), and first picture decoding means for decoding the first picture using an output of the enlarging/contracting means as a reference picture (such as an upper layer decoding unit 93). The encoded data includes first and second position information on the position of the first and the second picture, respectively, in a pre-set absolute coordinate system. The first picture decoding means is caused to recognize the position of the first picture based on the first position information and to convert the second position information in response to an enlarging ratio or a contracting ratio by which the enlarging/contracting means has enlarged or contracted the second picture. The first picture encoding means recognizes the position corresponding to the results of conversion as the position of the reference picture in order to decode the first picture.
In accordance with the above picture decoding device and a picture decoding method, the enlarging/contracting means enlarges or contracts the second picture decoded by the second picture decoding means based on the difference in resolution between the first and second pictures. The first picture decoding means decodes the first picture using an output of the enlarging/contracting means as a reference picture. If the encoded data includes the first position information or the second position information on the position of the first picture and on the position of the second picture, respectively, in a pre-set absolute coordinate system, the first picture decoding means recognizes the position of the first picture, based on the first position information, and converts the second position information responsive to an enlarging ratio or a contracting ratio by which the enlarging/contracting means has enlarged or contracted the second picture. The first picture decoding means recognizes the position corresponding to the results of conversion as the position of the reference picture, in order to decode the first picture.
In accordance with another aspect of the present invention, a recording medium is provided which has recorded thereon encoded data including first data obtained on predictive encoding a first picture using, as a reference picture, the enlarged or contracted results obtained on enlarging or contracting a second picture based on the difference in resolution between the first and second pictures, second data obtained on encoding the second picture, and first position information or second position information obtained on setting the positions of the first and second pictures in a pre-set absolute coordinate system. The first data is obtained on recognizing the position of the first picture based on the first position information, converting the second position information in response to the enlarging ratio or contracting ratio by which the second picture has been enlarged or contracted, and on recognizing the position corresponding to the results of conversion as the position of the reference picture in order to perform predictive coding.
In accordance with another aspect of the present invention, a method for recording encoded data is provided wherein, the encoded data includes first data obtained on predictive encoding a first picture using, as a reference picture, the enlarged or contracted results obtained on enlarging or contracting a second picture based on the difference in resolution between the first and second pictures, second data obtained on encoding the second picture, and first position information or second position information obtained on setting the positions of the first and second pictures in a pre-set absolute coordinate system. The first data is obtained on recognizing the position of the first picture based on the first position information, converting the second position information in response to the enlarging ratio or contracting ratio by which the second picture has been enlarged or contracted and on recognizing the position corresponding to the results of conversion as the position of the reference picture in order to perform predictive coding.
In accordance with another aspects of the present invention, a picture encoding device is provided which includes enlarging/contracting means for enlarging or contracting a second picture based on the difference in resolution between first and second pictures (such as the resolution converter 24 shown in FIG. 15), first picture encoding means for predictive coding the first picture using an output of the enlarging/contracting means as a reference picture (such as the upper layer encoding unit 23 shown in FIG. 15), second picture encoding means for encoding the second picture (such as the lower layer encoding unit 25 shown in FIG. 15), position setting means for setting the positions of the first picture and the second picture in a pre-set absolute coordinate system and outputting the first position information or the second position information on the position of the first or second picture, respectively (such as a picture layering unit 21 shown in FIG. 15), and multiplexing means for multiplexing outputs of the first picture encoding means, second picture encoding means, and the position setting means (such as the multiplexer 26 shown in FIG. 15). The position setting means sets the positions of the first and second pictures so that the position of the reference picture in a pre-set absolute coordinate system will be coincident with a pre-set position. The first picture encoding means recognizes the position of the first picture based on the first position information and also recognizes the pre-set position as the position of the reference picture in order to perform predictive coding.
In accordance with another aspect of the present invention, a picture encoding device for performing picture encoding is provided which includes enlarging/contracting means for enlarging or contracting a second picture based on the difference in resolution between first and second pictures (such as the resolution converter 24 shown in FIG. 15), first picture encoding means for predictive coding of the first picture using an output of the enlarging/contracting means as a reference picture (such as the upper layer encoding unit 23 shown in FIG. 15), second picture encoding means for encoding the second picture (such as the lower layer encoding unit 25 shown in FIG. 15), position setting means for setting the positions of the first picture and the second picture in a pre-set absolute coordinate system and outputting first position information or second position information on the position of the first or second picture, respectively (such as a picture layering unit 21 shown in FIG. 15), and multiplexing means for multiplexing outputs of the first picture encoding means, second picture encoding means, and the position setting means (such as the multiplexer 26 shown in FIG. 15). The position setting means causes the positions of the first and second pictures to be set so that the position of the reference picture in a pre-set absolute coordinate system will be coincident with the pre-set position. The first picture encoding means may recognize the position of the first picture as the position of the reference picture based on the first position information and to recognize the pre-set position as the position of the reference picture in order to perform predictive coding.
In accordance with the above picture encoding device and picture encoding method, the enlarging/contracting means enlarges or contracts the second picture based on the difference in resolution between the first and second pictures, while the first picture encoding means predictively encodes the first picture using an output of the enlarging/contracting means as a reference picture. The position setting means sets the positions of the first picture and the second picture in a pre-set absolute coordinate system and outputs the first position information or the second position information on the position of the first or second picture, respectively. The position setting means sets the positions of the first and second pictures so that the position of the reference picture in the pre-set absolute coordinate system will be coincident with a pre-set position. The first picture encoding means recognizes the position of the first picture based on the first position information and recognizes the pre-set position as the position of the reference picture in order to perform predictive coding.
In accordance with another aspect of the present invention, a picture decoding device for decoding encoded data is provided which includes second picture decoding means for decoding a second picture (such as an upper layer decoding unit 93 shown in FIG. 29), enlarging/contracting means for enlarging/contracting the second picture decoded by the second picture decoding means based on the difference in resolution between the first and second pictures (such as the resolution converter 94 shown in FIG. 29), and first picture decoding means for decoding the first picture using an output of the enlarging/contracting means as a reference picture (such as a lower layer decoding unit 95 shown in FIG. 29). The encoded data includes first position information or second position information on the position of the first picture or the position of the second picture, respectively, in a pre-set absolute coordinate system, in which the position of the reference picture in the pre-set absolute coordinate system has been set so as to be coincident with a pre-set position. The first picture decoding means recognizes the position of the first picture based on the first position information and recognizes the pre-position as the position of the reference picture in order to decode the first picture.
The above picture decoding device may include a display for displaying decoding results of the first picture decoding means (such as the monitor 74 shown in FIG. 27).
In accordance with another aspect of the present invention, a picture decoding device is provided which includes second picture decoding means for decoding a second picture (such as the upper layer decoding unit 93 shown in FIG. 29), enlarging/contracting means for enlarging/contracting the second picture decoded by the second picture decoding means based on the difference in resolution between first and second pictures (such as the resolution converter 94 shown in FIG. 29), and first picture decoding means for decoding the first picture using an output of the enlarging/contracting means as a reference picture (such as the lower layer decoder unit 95 shown in FIG. 29). The encoded data includes first position information or second position information on the position of the first picture or the position of the second picture in a pre-set absolute coordinate system in which the position of the reference picture in the pre-set coordinate system has been set so as to coincide with a pre-set position. The first picture decoding means is caused to recognize the position of the first picture based on the first position information and to recognize the pre-set position as the position of the reference picture in order to decode the first picture.
In accordance with the above picture decoding device and picture decoding method, the enlarging/contracting means enlarges or contracts the second picture decoded by the second picture decoding means based on the difference in resolution between the first and second pictures. If the encoded data includes the first position information or the second position information on the position of the first picture or on the position of the second picture, respectively, in a pre-set absolute coordinate system, in which the position of the reference picture in the pre-set absolute coordinate system has been set so as to be coincident with a pre-set position, the first picture decoding means recognizes the position of the first picture, based on the first position information, and recognizes the pre-position as the position of the reference picture, in order to decode the first picture.
In accordance with another aspect of the present invention, a recording medium is provided which has recorded thereon encoded data including first data obtained on predictive encoding a first picture using, as a reference picture, enlarged or contracted results obtained on enlarging or contracting a second picture based on the difference in resolution between the first and second pictures, second data obtained on encoding the second picture, and first position information or second position information obtained on setting the positions of the first and second pictures in a pre-set absolute coordinate system. The first position information and the second information having been set so that the position of the reference picture in the pre-set coordinate system will be coincident with a pre-set position.
In accordance with another aspect of the present invention, a recording method is provided for recording encoding data in which the encoded data includes first data obtained on predictive encoding a first picture using, as a reference picture, enlarged or contracted results obtained on enlarging or contracting a second picture based on the difference in resolution between the first and second pictures, second data obtained on encoding the second picture, and first position information or second position information obtained on setting the positions of the first and second pictures in a pre-set absolute coordinate system. The first position information and the second position information having been set so that the position of the reference picture in the pre-set absolute coordinate system will be coincident with a pre-set position.
In accordance with another aspect of the present invention, a picture encoding device is provided which includes first predictive coding means for predictive coding a picture (such as the lower layer encoding unit 25 shown in FIG. 15), local decoding means for locally decoding the results of predictive coding by the first predictive coding means (such as the lower layer encoding unit 25), second predictive coding means for predictive coding the picture using a locally decoded picture outputted by the local decoding means as a reference picture (such as the upper layer encoding unit 23 shown in FIG. 15), and multiplexing means for multiplexing the results of predictive coding by the first and second predictive coding means with only the motion vector used by the first predictive coding means in performing predictive coding (such as the multiplexer 26 shown in FIG. 15).
In accordance with another aspect of the present invention, a picture encoding method is provided which includes predictive coding a picture for outputting first encoded data, locally decoding the first encoded data, predictive coding the picture using a locally decoded picture obtained as a result of local decoding to output second encoded data, and multiplexing the first encoded data and the second encoded data only with the motion vector used for obtaining the first encoded data.
In accordance with the above picture encoding device and picture encoding method, a picture is predictively encoded to output first encoded data, the first encoded data is locally decoded and the picture is predictively encoded using, as a reference picture, a locally decoded picture obtained on local decoding to output second encoded data. The first and second encoded data are multiplexed using only the motion vector used for obtaining the first encoded data.
In accordance with another aspect of the present invention, a picture decoding device for decoding encoded data is provided which includes separating means for separating first and second data from the encoded data (such as a demultiplexer 91 shown in FIG. 29), first decoding means for decoding the first data (such as the lower layer decoding unit 95 shown in FIG. 29), and second decoding means for decoding the second data using an output of the first decoding means as a reference picture (such as the upper layer decoding unit 93 shown in FIG. 29). The encoded data includes only the motion vector used in predictive coding the first data. The second decoding means decodes the second data in accordance with the motion vector used in predictive coding the first data.
In accordance with another aspect of the present invention, a picture decoding device for decoding encoded data is provided which includes separating means for separating first and second data from the encoded data (such as the demultiplexer 91 shown in FIG. 29), first decoding means for decoding the first data (such as the lower layer decoding unit 95 shown in FIG. 29), and second decoding means for decoding the second data using an output of the first decoding means as a reference picture (such as the upper layer decoding unit 93 shown in FIG. 29). If the encoded data includes only the motion vector used in predictive coding the first data, the second decoding means is caused to decode the second data in accordance with the motion vector used in predictive coding the first data.
In accordance with the above picture decoding device and picture decoding method, the first decoding means decodes the first data and the second decoding means decodes the second data using an output of the first decoding means as a reference picture. If the encoded data includes only the motion vector used in predictive coding the first data; the second decoding means decodes the second data in accordance with the motion vector used in predictive coding the first data.
In accordance with another aspect of the present invention, a recording medium is provided which has recorded thereon encoded data which is obtained on predictive coding a picture for outputting first encoded data, locally decoding the first encoded data, predictive coding the picture using a locally decoded picture obtained as a result of local decoding to output second encoded data, and multiplexing the first encoded data and the second encoded data only with the motion vector used for obtaining the first encoded data.
In accordance with another aspect of the present invention, a method for recording encoded data is provided in which the encoded data is obtained on predictive coding a picture and outputting first encoded data, locally decoding the first encoded data, predictive coding the picture using a locally decoded picture obtained as a result of local decoding to output second encoded data, and multiplexing the first encoded data and the second encoded data only with the motion vector used for obtaining the first encoded data.
In accordance with another aspect of the present invention, a picture encoding device is provided wherein whether or not a macro-block is a skip macro-block is determined based on reference picture information specifying a reference picture used in encoding a macro-block of a B-picture by one of forward predictive coding, backward predictive coding or bidirectionally predictive coding.
In accordance with another aspect of the present invention, a picture encoding method is provided wherein whether or not a macro-block is a skip macro-block is determined based on reference picture information specifying a reference picture used in encoding a macro-block of a B-picture by one of forward predictive coding, backward predictive coding or bidirectionally predictive coding.
In accordance with another aspect of the present invention, a picture decoding device is provided wherein whether or not a macro-block is a skip macro-block is determined based on reference picture information specifying a reference picture used in encoding a macro-block of a B-picture by one of the forward predictive coding, backward predictive coding, or bidirectionally predictive coding.
In accordance with another aspect of the present invention, a picture decoding method is provided wherein whether or not a macro-block is a skip macro-block is determined based on reference picture information specifying a reference picture used in encoding a macro-block of a B-picture by one of the forward predictive coding, backward predictive coding, or bidirectionally predictive coding.
In accordance with another aspect of the present invention, a recording medium having recorded thereon encoded data is provided wherein a macro-block is a skip macro-block based on reference picture information specifying a reference picture used in encoding a macro-block of a B-picture by one of forward predictive coding, backward predictive coding, or bidirectionally predictive coding.
In accordance with another aspect of the present invention, a recording method f o r recording encoded data is provided in which a macro-block is a skip macro-block based on reference picture information specifying a reference picture used in encoding a macro-block of a B-picture by one of forward predictive coding, backward predictive coding or bidirectionally predictive coding.
In accordance with another aspect of the present invention, a picture processing device is provided in which a pre-set table used for variable length encoding or variable length decoding is modified in keeping with changes in size of a picture.
In accordance with another aspect of the present invention, a picture processing method is provided in which it is judged whether or not a picture is changed in size and a pre-set table used for variable length encoding or variable length decoding is modified in keeping with changes in size of the picture.
In accordance with another aspect of the present invention, a picture processing device is provided in which a pre-set table used for variable length encoding or variable length decoding is modified according to whether or not a picture of a layer different from and a timing same as a layer of a picture being encoded has been used as a reference picture.
In accordance with another aspect of the present invention, a picture processing method is provided in which a pre-set table used for variable length encoding or variable length decoding is modified according to whether or not a picture of a layer different from and a timing same as a layer of a picture being encoded has been used as a reference picture.
In accordance with another aspect of the present invention, a picture encoding device is provided in which a pre-set quantization step is quantized only if all of the results of quantization of pixel values in a pre-set block of a picture are not all of the same value.
The picture encoding device above for at least quantizing a picture by a pre-set quantization step includes multiplexing means for multiplexing the results of quantization of the picture and the pre-set quantization step (such as VLC unit 11 shown in FIGS. 22 and 23).
In accordance with another aspect of the present invention, a picture encoding method is provided in which a pre-set quantization step is quantized only if all of the results of quantization of pixel values in a pre-set block of a picture are not all of the same value.
In accordance with another aspect of the present invention, a picture decoding device for decoding encoded data is provided in which the encoded data contains a pre-set quantization step only if all of the results of quantization of pixel values in a pre-set block of a picture are not all of the same value.
In accordance with another aspect of the present invention, a picture decoding method for decoding encoding data is provided in which the encoded data contains a pre-set quantization step only if all of the results of quantization of pixel values in a pre-set block of a picture are not all of the same value.
In accordance with another aspect of the present invention, a recording medium having encoded data recorded thereon is provided in which the encoded data contains a pre-set quantization step only if all of the results of quantization of pixel values in a pre-set block of a picture are not all of the same value.
In accordance with another aspect of the present invention, a recording method for recording encoded data is provided in which the encoded data contains a pre-set quantization step only if all of the results of quantization of pixel values in a pre-set block of a picture are not all of the same value.