The present invention relates to a video encoding apparatus and video decoding apparatus, which encode, transmit, and store video signals with high efficiency, and decode the encoded signals.
Since a video signal has a large information volume, it is a common practice to compression-encode the video signal when it is transmitted or stored. In order to encode a video signal with high efficiency, an image in units of frames is divided into blocks in units of a predetermined number of pixels (for example, Mxc3x97N pixels (M: the number of pixels in the horizontal direction, N: the number of pixels in the vertical direction)), each divided block is orthogonally transformed to separate the spatial frequency of the image into the respective frequency components, and these frequency components are acquired as transform coefficients and are encoded.
As one of video encoding methods, a video encoding method that belongs to the category called mid-level encoding is proposed in xe2x80x9cJ. Y. A. Wang et. al. xe2x80x9cApplying Mid-level Vision Techniques for Video Data Compression and Manipulationxe2x80x9d, M.I.T. Media Lab. Tech. Report No. 263, February 1994xe2x80x9d.
In this method, if an image including a background and a subject (the subject will be referred to as an object hereinafter) is present, the background and object are separately encoded.
In order to separately encode the background and object in this way, for example, an alpha-map signal as binary subsidiary video information that expresses the shape of the object and its position in a frame, is required. Note that the alpha-map signal of the background is uniquely obtained based on that of the object.
As a method of efficiently encoding this alpha-map signal, binary image encoding (e.g., MMR (Modified Modified READ) encoding or the like), or line figure encoding (chain encoding or the like) are used.
Furthermore, in order to reduce the number of encoded bits of the alpha-map, a method of approximating the contour of a given shape by polygons and smoothing it by spline curves (J. Ostermann, xe2x80x9cObject-based analysis-synthesis coding based on the source model of moving rigid 3D objectsxe2x80x9d, Signal Process: Image Comm. Vol. 6 No. 2 pp. 143-161, 1994), a method of down-sampling and encoding an alpha-map, and approximating the encoded alpha-map by curves when it is up-sampled (see Japanese Patent Application No. 5-297133), and the like are known.
When an image in a frame is broken up into a background and object upon encoding the image, as described above, an alpha-map signal that expresses the shape of the object and its position in the frame is required to extract the background and object. For this reason, this alpha-map information is encoded to form a bit stream together with encoded information of an image, and the bit stream is subjected to transmission and storage.
However, in the method of dividing an image in the frame into a background and object, the number of encoded bits increases as compared to the conventional encoding method that simultaneously encodes an image in the frame, since the alpha-map must also be encoded, and the encoding efficiency lowers due to an increase in the number of encoded bits of the alpha-map.
It is an object of the present invention to provide a video encoding apparatus and video decoding apparatus, which can efficiently encode and decode alpha-map information as subsidiary video information that express the shape of the object and its position in a frame.
According to the present invention, there is provided a video encoding apparatus which encodes an image together with an alpha-map as information for discriminating the image into an object area and background area, and encodes the alpha-map using relative address encoding, comprising means for encoding a symbol that represents a position of the next change pixel to be encoded relative to a reference change pixel as the already encoded change pixel using a variable-length coding table, and means for holding not less than two variable-length coding tables equivalent to the variable-length coding table, and switching the variable-length coding tables in correspondence with a pattern of the already encoded alpha-map.
According to the present invention, there is provided a video decoding apparatus for decoding an encoded bit stream obtained by encoding of the encoding apparatus, comprising means for decoding the symbol using a variable-length coding table, and means for holding not less than two variable-length coding tables equivalent to the variable-encoding table, and switching the variable-length coding tables in correspondence with a pattern of the already decoded alpha-map.
Furthermore, the means for switching the variable-length coding tables is means for switching the tables with reference to a pattern near the reference change pixel.
The apparatus with the above arrangement is characterized in that a plurality of types of variable-length coding tables are prepared, and these variable-length coding tables are switched in correspondence with the pattern of the already encoded alpha-map, in encoding/decoding that reduces the number of encoded bits by encoding the symbol that specifies the position of a change pixel using the variable-length coding table. According to the present invention mentioned above, an effect of further reducing the number of encoded bits of the alpha-map can be obtained.
According to the present invention, there is provided a binary image encoding apparatus which serves as an encoding circuit for a motion video encoding apparatus for encoding motion video signals for a plurality of frames obtained as time-series data in units of objects having arbitrary shapes, and has means for dividing a rectangle area including an object into blocks each consisting of Mxc3x97N pixels (M: the number of pixels in the horizontal direction, N: the number of pixels in the vertical direction), and means for sequentially encoding the divided blocks in the rectangle area in accordance with a predetermined rule, having decoded value storage means for storing a decoded value near the block, image holding means (frame memory) for storing decoded signals of the already encoded frame (image frame), a motion estimation/compensation circuit for generating a motion estimation/compensation value using the decoded signals in the image holding means (frame memory), and means for detecting a change pixel as well as a decoded value near the block with reference to the decoded value storage means, whereby a reference change pixel for relative address encoding is obtained not from a pixel value in the block but from a motion estimation/compensation signal.
There is also provided an alpha-map decoder having means for sequentially decoding a rectangle area including an object in units of blocks each consisting of Mxc3x97N pixels in accordance with a predetermined rule, means for storing a decoded value near the block, image holding means (frame memory) for storing decoded signals of the already encoded frame (image frame), a motion estimation/compensation circuit for generating a motion estimation/compensation value using the decoded signals in the image holding means (frame memory), and means for detecting a change pixel as well as a decoded value near the block with reference to the decoded value storage means, whereby a reference change pixel for relative address encoding is obtained not from a pixel value in the block but from a motion estimation/compensation signal.
With these circuits, the alpha-map information as subsidiary video information that represents the shape of an object and its position in a frame can be efficiently encoded and decoded.
Furthermore, there is provided a video encoding apparatus having means for storing a decoded value near a block, image holding means (frame memory) for storing decoded signals of the already encoded frame (image frame), motion estimation/compensation circuit for generating a motion estimation/compensation value using the decoded signals in the image holding means (frame memory), means for detecting a change pixel as well as a decoded value near the block with reference to the decoded value storage means, and means for switching between a reference change pixel obtained from an interpolated pixel or decoded pixel value in the block and a reference change pixel for relative address encoding, whereby relative address encoded information is encoded together with switching information.
There is also provided an alpha-map decoder having means for sequentially decoding a rectangle area including an object in units of blocks each consisting of Mxc3x97N pixels in accordance with a predetermined rule, means for storing a decoded value near the block, image holding means (frame memory) for storing decoded signals of already encoded frame (image frame), a motion estimation/compensation circuit for generating a motion estimation/compensation value using the decoded signals in the image holding means (frame memory), and means for detecting a change pixel as well as a decoded value near the block with reference to the decoded value storage means, and also having means for switching between a reference change pixel obtained from an interpolated pixel or decoded pixel value in the block and a reference change pixel for relative address encoding, whereby a reference change pixel is obtained in accordance with switching information.
In this case, upon relative address encoding, a process is done while switching whether a reference change pixel b1 is detected from a xe2x80x9ccurrent blockxe2x80x9d as a block of the currently processed image or from a xe2x80x9ccompensated blockxe2x80x9d as a block of the previously processed image in units of blocks, and the encoding side also encodes this switching information. The decoding side decodes the switching information, and can switch whether a reference change pixel b1 is detected from the xe2x80x9ccurrent blockxe2x80x9d or xe2x80x9ccompensated blockxe2x80x9d on the basis of the decoded switching information. In this fashion, an optimal process can be done based on the image contents in units of blocks, and encoding with higher efficiency can be attained.
According to the present invention, a video encoding apparatus which divides a rectangle area including an object into blocks each consisting of Mxc3x97N pixels (M: the number of pixels in the horizontal direction, N: the number of pixels in the vertical direction, and sequentially encodes the divided blocks in the rectangle area in accordance with a predetermined rule) so as to encode motion video signals for a plurality of frames obtained as time-series data in units of objects having arbitrary shapes, comprises alpha-map encoding means including a frame memory for storing a decoded signal of the current frame including decoded signals near the block and a decoded signal of the encoded frame in association with an alpha-map signal representing the shape of the object, means for replacing pixel values in the block by one of binary values, motion estimation/compensation means for generating a motion estimation/compensation value using a decoded signal of the already encoded frame in the frame memory, means for size-converting (up-sampling/down-sampling) a binary image in units of blocks, means for encoding a size conversion ratio as side information, and binary image encoding means for encoding binary images down-sampled in units of blocks.
The alpha-map encoding means selects the decoded image of the block from decoded values obtained by replacing all the pixel values in the block by one of binary values, motion estimation/compensation values, and decoded values obtained upon size conversion in units of blocks. Hence, the alpha-map signal can be encoded with high quality and efficiency, and encoding can be done at a high compression ratio while maintaining high image quality.
Also, a video decoding apparatus which sequentially decodes a rectangle area in units of blocks each consisting of Mxc3x97N pixels (M: the number of pixels in the horizontal direction, N: the number of pixels in the vertical direction) including an object in accordance with a predetermined rule so as to decode motion video signals for a plurality of frames obtained as time-series data in units of objects having arbitrary shapes, comprises alpha-map decoding means including a frame memory for storing a decoded signal of the current frame including a decoded signal near the block, and a decoded signal of the encoded frame, means for replacing all pixel values in the block by one of binary values, motion estimation/compensation means for generating a motion estimation/compensation value using a decoded signal of the already encoded frame in the frame memory, means for size-converting a binary image in units of blocks, and binary image decoding means for decoding down-sampled binary images in units of blocks.
The alpha-map decoding means selects the decoded image of the block from decoded values obtained by replacing all the pixel values in the block by one of binary values, motion estimation/compensation values, and decoded values obtained upon size conversion in units of blocks. Hence, a high-quality image can be decoded.
A system for encoding shape modes in units of blocks upon encoding alpha-maps in units of blocks, has means for setting a video object plane (VOP) which includes an object and is expressed by a multiple of a block size, means for dividing the VOP into blocks, labeling means for assigning labels unique to the individual shape modes to the blocks, storage means for storing the labels in units of frames, determination means for determining a reference block of the previous frame corresponding to a block to be encoded of the current frame, prediction means for determining a prediction value on the basis of at least the labels of the previous frame held in the storage means, and the reference block, and encoding means for encoding label information of the block to be encoded using the prediction value.
A decoding apparatus for decoding shape modes of an alpha-map in units of blocks, comprises storage means for storing decoded labels in units of frames, determination means for determining a reference block of the previous frame corresponding to a block to be decoded of the current frame, prediction means for determining a prediction value on the basis of at least labels of the previous frame held in the storage means and the reference block, and decoding means for decoding label information of the block to be decoded using the prediction value.
With these apparatuses, upon encoding an alpha-map in units of macro blocks (divided unit image blocks obtained when an image is divided into units each consisting of a plurality of pixels, e.g., 16xc3x9716 pixels), unique labels are assigned to the shape modes of the blocks and are encoded, and original alpha-map data is decoded by decoding these labels, thus attaining efficient encoding.
According to the present invention, a video encoding apparatus which encodes shape modes in units of blocks upon encoding an alpha-map in units of blocks when an image is encoded together with an alpha-map as information for discriminating the image into an object area and background area, comprises means for setting a VOP which includes an object and is expressed by a multiple of a block size, means for dividing the VOP into blocks, labeling means for assigning labels unique to the individual shape modes to the blocks, storage means for storing the labels or alpha-maps in units of frames, determination means for determining a reference block of the previous frame corresponding to a block to be encoded of the current frame, prediction means for determining a prediction value on the basis of at least the labels or alpha-maps of the previous frame held in the storage means, and the reference block, and encoding means for encoding label information of the block to be encoded using the prediction value.
Furthermore, the apparatus comprises storage means for storing size conversion ratios in units of frames, the encoding means comprises means which can vary a size conversion ratio of a frame in units of frames and performs encoding in correspondence with the size conversion ratio, and the determination means comprises means for determining a reference block of the previous frame corresponding to a block to be encoded of the current block using the size conversion ratio of the current frame, and the size conversion ratio of the previous frame obtained from the storage means.
Alternatively, the apparatus comprises storage means for storing size conversion ratios in units of frames, the encoding means comprises means which can vary a size conversion ratio of a frame in units of frames and performs encoding in correspondence with the size conversion ratio, the determination means comprises means for determining a reference block of the previous frame corresponding to a block to be encoded of the current block using the size conversion ratio of the current frame, and the size conversion ratio of the previous frame obtained from the storage means, and the prediction means comprises means for, when there are a plurality of reference blocks, determining a majority label of those of the plurality of reference blocks as the prediction value.
Alternatively, the apparatus comprises storage means for storing size conversion ratios in units of frames, the encoding means comprises means which can vary a size conversion ratio of a frame in units of frames, performs encoding in correspondence with the size conversion ratio, and encodes the block to be encoded using one selected from a plurality of types of variable-length coding tables in accordance with one or both of the size conversion ratios of the previous and current frames, and the determination means comprises means for determining a reference block of the previous frame corresponding to a block to be encoded of the current block using the size conversion ratio of the current frame, and the size conversion ratio of the previous frame obtained from the storage means.
A decoding apparatus for decoding shape modes of an alpha-map in units of blocks, comprises storage means for storing decoded labels or alpha-maps in units of frames, determination means for determining a reference block of the previous frame corresponding to a block to be decoded of the current block, prediction means for determining a prediction value on the basis of at least the labels or alpha-maps of the previous frame stored in the storage means, and the reference block, and decoding means for decoding label information of the block to be decoded using the prediction value.
The apparatus further comprises means which can vary a size conversion ratio of a frame in units of frames, and decodes the size conversion ratio information, and storage means for holding the size conversion ratio information, and the determination means comprises a function of determining the reference block of the previous frame corresponding to the block to be decoded of the current frame using the size conversion ratio of the previous frame read out from the storage means.
Alternatively, the apparatus further comprises means which can vary a size conversion ratio of a frame in units of frames, and decodes the size conversion ratio information, and storage means for holding the size conversion ratio information, the determination means comprises a function of determining the reference block of the previous frame corresponding to the block to be decoded of the current frame using the size conversion ratio of the previous frame read out from the storage means, and the prediction means determines a majority label of those of a plurality of reference blocks as the prediction value if there are the plurality of reference blocks.
An up-sampling circuit for up-sampling a block of a binary image which is down-sampled to xc2xdN (N=1, 2, 3, . . . ) in both the horizontal and vertical directions, comprises a memory for holding a decoded value near the block, means for obtaining a reference pixel value by down-sampling the decoded value held in the memory to xc2xdN in accordance with a down-sampling ratio of the block, and means for up-sampling the block to an original size by repeating a process for up-sampling the block by a factor of 2 in both the horizontal and vertical directions N times, and is characterized in that the up-sampling means always uses a reference pixel value down-sampled to xc2xdN.
A video encoding apparatus which divides a rectangle area including an object into blocks each consisting of Mxc3x97N pixels (M: the number of pixels in the horizontal direction, N: the number of pixels in the vertical direction), and sequentially encodes the rectangle areas in units of the divided blocks in accordance with a predetermined rule so as to encode motion video signals for a plurality of frames obtained as time-series data in units of objects having arbitrary shapes, and which has setting means for setting an area which includes an object and is expressed by a multiple of a block size, division means for dividing the area set by the setting means into blocks, and means for prediction-encoding motion vectors required for making motion estimation/compensation in the divided blocks, comprises a memory for holding a first position vector representing a position, in the frame, of the area in the reference frame, encoding means for encoding a second position vector representing a position, in the frame, of the area in the reference frame, a motion vector memory for holding motion vectors of decoded blocks near the block to be encoded, and means for predicting a motion vector of the block to be encoded using the motion vectors stored in the motion vector storage means, and is characterized in that when the motion vector memory does not store any motion vectors used in the prediction means, a default motion vector is used as a prediction value, and a difference vector between the first and second position vectors and zero vector are selectively used as the default motion vector.
A video decoding apparatus which decodes motion video signals for a plurality of frames obtained as time-series data in units of objects having arbitrary shapes, and sequentially decodes a rectangle area in units of blocks each consisting of Mxc3x97N pixels (M: the number of pixels in the horizontal direction, N: the number of pixels in the vertical direction) in accordance with a predetermined rule, comprises means for decoding a prediction-encoded motion vector required for performing motion estimation/compensation in the block, means for decoding a prediction-encoded motion vector required for performing motion estimation/compensation in a reference frame, a memory for holding a first position vector representing a position, in a frame, of the area in the reference frame, means for decoding a second position vector representing a position, in a frame, of the area in the frame, a motion vector memory for holding motion vectors of corrected blocks near a block to be decoded, and prediction means for predicting a motion vector of the block to be decoded using the motion vectors held in the motion vector memory, and is characterized in that when the motion vector memory does not store any motion vectors used in the prediction means, a default motion vector is used as a prediction value, and one of a difference vector between the first and second position vectors and zero vector is selectively used as the default motion vector.