Since a video signal has a large information volume, it is a common practice to compression-encode the video signal when it is transmitted or stored. In order to encode a video signal with high efficiency, an image in units of frames is divided into blocks in units of a predetermined number of pixels (for example, M×N pixels (M: the number of pixels in the horizontal direction, N: the number of pixels in the vertical direction)), each divided block is orthogonally transformed to separate the spatial frequency of the image into the respective frequency components, and these frequency components are acquired as transform coefficients and are encoded.
As one of video encoding methods, a video encoding method that belongs to the category called mid-level encoding is proposed in “J. Y. A. Wang et. al. “Applying Mid-level Vision Techniques for Video Data Compression and Manipulation”, M.I.T. Media Lab. Tech. Report No. 263, February 1994,”.
In this method, if an image including a background and a subject (the subject will be referred to as an object hereinafter) is present, the background and object are separately encoded.
In order to separately encode the background and object in this way, for example, an alpha-map signal as binary subsidiary video information that expresses the shape of the object and its position in a frame, is required. Note that the alpha-map signal of the background is uniquely obtained based on that of the object.
As a method of efficiently encoding this alpha-map signal, binary image encoding (e.g., MMR (Modified Modified READ) encoding or the like), or line figure encoding (chain encoding or the like) are used.
Furthermore, in order to reduce the number of encoded bits of the alpha-map, a method of approximating the contour of a given shape by polygons and smoothing it by spline curves (J. Ostermann, “Object-based analysis-synthesis coding based on the source model of moving rigid 3D objects”, Signal Process. :Image Comm. Vol. 6 No. 2 pp. 143–161, 1994), a method of down-sampling and encoding an alpha-map, and approximating the encoded alpha-map by curves when it is up-sampled (see Japanese Patent Application No. 5-297133), and the like are known.
When an image in a frame is broken up into a background and object upon encoding the image, as described above, an alpha-map signal that expresses the shape of the object and its position in the frame is required to extract the background and object. For this reason, this alpha-map information is encoded to form a bit stream together with encoded information of an image, and the bit stream is subjected to transmission and storage.
However, in the method of dividing an image in the frame into a background and object, the number of encoded bits increases as compared to the conventional encoding method that simultaneously encodes an image in the frame, since the alpha-map must also be encoded, and the encoding efficiency lowers due to an increase in the number of encoded bits of the alpha-map.