This disclosure is based on Korean Patent Application No. 98-42434 filed on Oct. 10, 1998, herein incorporated by reference;
1. Field of the Invention
The present invention relates to data coding and decoding, and more particularly, to a moving picture coding/decoding method and apparatus, in each of which a spatially scalable architecture and a signal-to-noise ratio (SNR) scalable architecture are included together, for efficiently coding and transmitting video when an arbitrary shaped object continuously moves.
2. Description of the Related Art
Many coding/decoding methods which have been developed substantially relate to the coding/decoding of a quadrilateral picture of a predetermined size such as the screen of a television. Examples of these methods are a motion picture experts group (MPEG)-1, MPEG-2, H.261 and H.263.
Since most of the conventional coding methods only provide services of a quite limited hierarchical architecture, they cannot be positively adopted by an architecture in which the state of a transmission line frequently changes such as in an Internet/Intranet or a wireless network. MPEG-2 video (ISO/IEC JTC1/SC29/WG11 13818-2), which is a representative conventional coding method, proposes spatially scalable coding in which two spatially scalable architectures are provided and SNR scalable coding in which two or three scalable architectures are provided, with respect to a moving picture of a quadrilateral screen shape. However, the number of scalable layers is limited so that it is difficult to create a real application area from this method. Moreover, MPEG-4 video (ISO/IEC JTC1/SC29/WG11 14496-2) which provides efficient compression also proposes a coding method having a spatially scalable architecture and a temporal scalable architecture. However, a method for providing a SNR scalable architecture for a bitstream in the same spatial domain has not yet proposed, thereby limiting the quality of service.
To solve the above problems, an object of the present invention is to provide a moving picture coding/decoding method and apparatus for providing a SNR scalable coding function, which can variably determine picture quality in a predetermined space, as well as a spatially scalable coding function, so as to transmit data in different ways depending on the limitations of a transmission line or the receiving performance of a receiving terminal. The method and apparatus also provide scalable coding of an arbitrary shaped object as well as a quadrilateral picture, thereby providing various qualities of service.
To achieve the above object, in one embodiment, the present invention provides a method of constructing spatially and SNR scalable architectures with respect to input video data composed of the shape information and inner texture information of an object and then coding the input video data, the method including the steps of (a) down sampling the shape information and the texture information by a predetermined ratio to construct a spatially scalable architecture including a single base layer and at least one enhancement layer; (b) coding the shape and texture information of the base layer to generate a base layer bitstream, frequency transform coding the difference between decoded texture information and original texture information, and constructing a SNR scalable architecture based on frequency bands; and (c) with respect to each of the at least one enhancement layer, coding the difference between shape information upsampled from the base layer and the shape information of the enhancement layer to generate an enhancement layer bitstream, frequency transform coding the difference between the decoded texture information obtained in the step (b) and the texture information of the enhancement layer, and constructing a SNR scalable architecture based on frequency bands.
In another embodiment, the present invention provides a method of constructing spatially and SNR scalable architectures with respect to input video data composed of the shape information and inner texture information of an object and then coding the input video data, the method including the steps of (a) down sampling the shape information and the texture information to construct a spatially scalable architecture including a single base layer, which is obtained by down sampling by a first ratio, and at least one enhancement layer, which is obtained by down sampling by a second ratio that is smaller than that adopted for the base layer; (b) with respect to the shape information and texture information of the base layer, (b1) shape coding the shape information of the base layer; (b2) padding, frequency transform coding and quantizing the texture information of the base layer; (b3) collecting and variable length coding the data generated in the steps (b1) and (b2) to generate a base layer bitstream; (b4) obtaining the difference between texture information reproduced by dequantizing and inverse frequency transforming the data generated in the step (b2) and the texture information of the base layer; (b5) frequency transform coding the difference obtained in the step (b4) and classifying the results of the frequency transform coding by frequency to generate bitstreams based on frequency bands; (c) with respect to the shape and texture information of each enhancement layer, (c1) shape coding and variable length coding the difference between the shape information of the enhancement layer and shape information obtained by upsampling the shape information of the base layer to the enhancement layer, to generate an enhancement layer bitstream; (c2) obtaining the difference between the texture information of the enhancement layer and texture information obtained by upsampling the texture information reproduced in the step (b4) to the enhancement layer and padding the result of the upsampling; and (c3) frequency transform coding the difference obtained in the step (c2) and classifying the results of the frequency transform coding by frequency to generate bitstreams based on frequency bands.
To achieve the above object, the present invention also provides a method of decoding a bitstream, which has been coded in spatially and SNR scalable architectures. The method includes the steps of (a) variable length decoding the bitstream to divide it into a base layer bitstream and at least one enhancement layer bitstream; (b) shape decoding coded shape information contained in the base layer bitstream to generate base layer shape information; (c) dequantizing and inverse frequency transforming coded texture information contained in the base layer bitstream to generate base layer texture information; (d) sequentially inverse frequency transforming bitstreams selected from the SNR scalable architecture of the base layer bitstream and adding the results to the base layer texture information; and (e) with respect to at least one selected enhancement layer, sequentially repeating the steps of: (e1) upsampling the shape information of a spatial reference layer to the enhancement layer; (e2) upsampling the texture information of a SNR reference layer which falls under a spatial reference layer; (e3) shape decoding enhancement layer shape information contained in the enhancement layer bitstream and adding the result to the upsampled shape information of the lower layer; and (e4) sequentially inverse frequency transforming bitstreams selected from the SNR scalable architecture of the enhancement layer bitstream and adding the results to the upsampled texture information of the lower layer.
This method also includes the step of previously decoding a spatial reference layer identifier and a SNR reference layer identifier before the step (e1). The spatial reference layer is a layer immediately below the enhancement layer. The SNR reference layer is a SNR base layer which falls under the spatial reference layer. In another aspect, the SNR reference layer is a highest SNR layer which falls under the spatial reference layer.
Further, the present invention provides an apparatus for constructing spatially and SNR scalable architectures with respect to input video data composed of the shape information and inner texture information of an object and then coding the input video data. The apparatus includes a down sampling unit for down sampling the shape information and the texture information to construct a spatially scalable architecture including a single base layer, which is obtained by down sampling by a first ratio, and at least one enhancement layer, which is obtained by down sampling by a second ratio that is smaller than that adopted for the base layer; a base layer coder comprising a first shape coder for shape coding the shape information of the base layer; a texture coder for padding, frequency transform coding and quantizing the texture information of the base layer; a first variable length coder for collecting and variable length coding the data output from the first shape coder and the texture coder, to generate a base layer bitstream; a texture decoder for dequantizing and inverse frequency transforming the data output from the texture coder, to reproduce texture information; a first difference image generator for generating the difference between the reproduced texture information from the texture decoder and the texture information of the base layer; and a first SNR scalable architecture generator for frequency transform coding the difference generated by the first difference image generator and classifying the results of the frequency transform coding by frequency, to generate bitstreams based on frequency bands; and at least one enhancement layer coder comprising an upsampling unit for upsampling the shape information of the base layer to the enhancement layer and upsampling the texture information reproduced by the texture decoder to the enhancement layer; a second shape coder for shape coding the difference between the upsampled shape information and the shape information of the enhancement layer; a second variable length coder for variable length coding the output data of the second shape coder to generate an enhancement layer bitstream; a second difference image generator for obtaining the difference between the texture information of the enhancement layer and texture information obtained by padding the output data of the upsampling unit; and a second SNR scalable architecture generator for frequency transform coding the difference generated by the second difference image generator and classifying the results of the frequency transform coding by frequency to generate bitstreams based on frequency bands.
To achieve the above object, the present invention provides an apparatus for decoding a bitstream, which has been coded in spatially and SNR scalable architectures. The apparatus includes a variable length decoder for variable length decoding the bitstream to divide it into a base layer bitstream and at least one enhancement layer bitstream; a base layer decoder comprising a first shape decoder for shape decoding coded shape information contained in the base layer bitstream to generate base layer shape information; a texture decoder for dequantizing and inverse frequency transforming coded texture information contained in the base layer bitstream to generate base layer texture information; and a first SNR scalable architecture decoder for sequentially inverse frequency transforming selected bitstreams in the SNR scalable architecture of the base layer bitstream and adding the results to the base layer texture information; and at least one enhancement layer decoder comprising an upsampling unit for upsampling the shape and texture information of a layer immediately below the enhancement layer in a spatially scalable architecture, to the enhancement layer; a second shape decoder for shape decoding enhancement layer shape information contained in the enhancement layer bitstream and adding the result to the upsampled shape information of the lower layer; and a second SNR scalable architecture decoder for sequentially inverse frequency transforming bitstreams selected from the SNR scalable architecture of the enhancement layer bitstream and adding the results to the upsampled texture information of the lower layer.