1. Field of the Invention
The present invention relates to a shape information coding device for interlaced scanning video and method therefor, and more particularly, to a shape information coding device for interlaced scanning video and method therefor which can detect an amount of motion of object video, on coding of interlaced scanning video and encode the object video in a frame type block unit or in a field type block unit in accordance with the detected result.
2. Discussion of Related Art
With the recent development of the research of next generation video/audio coding technology and system construction, improved effort is tried for video/audio applications which have not been supported by the well known standard plans such as, for example, H.263 and H.261 of ITU-T, and MPEG-1 and MPEG-2 of ISO/IEC. Examples of the accomplished functions are object-based interactive functionality and object-based manipulation. To provide various kinds of new functions, shape information should be of course transmitted. The shape information serves to divide video into an object area and non-object area(that is, background), to allow a signal process in transmitting/receiving terminals to be implemented not for whole video but for object area, and to provide the above new functions.
A general binary shape information has a binary mask type, in which the pixels corresponding to an object have different values from the pixels to non-object. For instance, the pixel corresponding to the object has a logic value of xe2x80x9c1xe2x80x9d, and that corresponding to the non-object has a logic value of xe2x80x9c0xe2x80x9d. A method for coding motional video by using the shape information is called. xe2x80x9cobject-based video codingxe2x80x9d.
In other words, the divided object video can be coded and compressed, independently of the background video. In the case where an object to be coded in the video screen exists, an operation for coding the shape information corresponding to the object or area is necessary. The shape information coding method is achieved in a context-based arithmetic encoding(hereinafter, referred to xe2x80x9cCAExe2x80x9d) manner, under arithmetic encoding by using a probability table of the context configuration.
FIG. 1 is a block diagram illustrating configuration of a representative object-based video coder. Firstly, each of video signals is divided into shape information and texture information. The shape information is inputted to a shape information coding unit 11 and the texture information to a motion estimating unit 12.
In the shape information coding unit 11, a lossy coding or a lossless coding process is performed for the shape information of the corresponding video, and a reconstructed shape information as an output of the shape information coding unit 11 is inputted to the motion estimating unit 12, a motion compensating unit 13 and a texture information coding unit 17, respectively, in an object unit.
Meanwhile, shape information bitstream as another output of the shape information coding unit 11 is inputted to a multiplexer 18. The motion estimating unit 12 serves to estimate motion information for the texture information of current video, by using the texture information inputted in current frame and the texture information of the previous video stored in a previous reconstructed frame memory 14. The estimated motion information is inputted to the motion compensating unit 13 and the coded motion information is also inputted to the multiplexer 18. The motion compensating unit 13 serves to execute motion compensation prediction by using the motion information obtained by the motion estimating unit 12 and the previous reconstructed frame stored in the previous reconstructed frame memory 14. A subtractor 15 obtains a prediction error between the inputted texture information and the motion compensation texture information obtained by the motion compensating unit 13. A texture information coding unit 17 functions to code the prediction error obtained by the subtractor 15. The texture information bitstream generated from the texture information coding unit 17 is inputted to the multiplexer 18, and the error signal of the reconstructed texture information is inputted to an adder 16. The previous reconstructed frame memory 14 stores the previous reconstructed frame signal which is outputted from the adder 16 in which the motion compensation prediction signal and the reconstructed error signal are added.
On the other hand, a digital video is divided into a progressive scanning video and an interlaced scanning video in accordance with the structure arrangement of frame. In the progressive scanning video, frames are arranged in order by one line and the video is coded, transmitted and displayed in one frame unit. Contrarily, in the interlaced scanning video, two fields are arranged in order by one line, and each frame is formed in the manner where the two fields are inserted therein by one line, so that the height(the number of lines) of each field is half the height of the frame.
Examples of the progressive scanning frame and interlaced scanning frame are shown in FIGS. 2A and 2B.
In more detail, FIG. 2A shows the progressive scanning frame and FIG. 2B shows the interlaced scanning frame in which two fields(top field and bottom field) are inserted by one line. The top field(indicated by a solid line arrow) and bottom field(indicated by a dotted line arrow) are arranged by one line in the interlaced scanning frame, so that the solid line arrow and the dotted line arrow are arranged in turn in the frame. As shown in FIG. 2B, a time difference between the top and bottom fields exists, and in this case, the top field precedes the bottom field. However, the bottom field may precede the top field. Because of the time difference between the top and bottom fields, signal characteristics between the lines adjacent to each other within the interlaced scanning frame can be different. More particularly, the different signal characteristic is apparent in the video having a large amount of motion information.
FIGS. 3A and 3B are views each illustrating a method for determining a motion vector predictor for shape(hereinafter, referred to as xe2x80x9cMVPsxe2x80x9d) in a conventional shape information coding method. FIG. 3A shows a current shape binary alpha block(hereinafter, referred to as xe2x80x9cBABxe2x80x9d), the BABs of left side, top side and right top side of the current shape BAB, and the motion vector of each of the BABs adjacent to the current shape BAB. At this time, it is assumed that the size of BAB is 16xc3x9716. FIG. 3B shows a texture information macroblock(hereinafter, referred to as xe2x80x9cMBxe2x80x9d) corresponding to the current shape BAB, the MBs of left side, top side and right top side of the MB, and the motion vector of each of the MBs adjacent to the MB. At this time, it is assumed that the size of MB is 16xc3x9716, in the same manner as the BAB. Each of the motion vectors MV1, MV2 and MV3 of the adjacent blocks to the texture information MB indicates the motion vector of the corresponding block, and if the corresponding MB estimates and compensates one motion vector per 16xc3x9716 MB, the motion vector of the MB is the same as the above. However, if the corresponding MB estimates and compensates one motion vector per 8xc3x978 MB, the motion vector of the MB positioned is indicated. The motion vector MVs of the current shape BAB is given as follows: MVs=MVDs(motion vector difference value for shape)+MVPs. In other words, the MVPs is first determined and from the determined value, the MVDs is obtained. The MVDs is the information which is transmitted to the receiving terminal from the transmitting terminal. The receiving terminal determines the MVs with the MVDs transmitted from the transmitting terminal and the MVPs obtained in the same manner as the transmitting terminal. Therefore, since the MVPs is obtained by using the same information as the transmitting and receiving terminals, the determination of the MVPs should be made by using only the information which has been decoded and stored in the receiving terminal. As shown in FIGS. 3A and 3B, a method of determining the MVPs comprises the steps of checking whether the corresponding motion vector exists in the order of MVs1, MVs2 and MVs3 for the shape BAB and in the order of MV1, MV2 and MV3 for the texture information MB and determining the motion vector in the priority order as the MVPs. For instance, in the case where the motion vector of the shape information exists only in the top side block MVs2 and the right top side MVs3, the MVPs is determined as the MVs2.
The existence/non-existence of the motion vector are determined in consideration of the following two cases.
Firstly, in the case where the corresponding BAB or MB is in the intra-video mode, the motion vector does not exist. Since the motion compensation is not performed in the intra-video mode, the motion vector is not transmitted to the receiving terminal. Secondly, in the case where the object pixel does not exist within the corresponding BAB or MB, the motion vector does not exist. In this case, since the motion compensation is not performed, the motion vector is not transmitted to the receiving terminal. The motion vector does not exist in the BAB or MB of the above-mentioned cases. As known, the above-discussed method is already disclosed in the MPEGxe2x80x944(Moving Picture Expert Groupxe2x80x944) Visual CD(Committee Draft).
FIGS. 4A and 4B are views illustrating the borderings upon coding of the interlaced scanning video, in which FIG. 4A shows a view of a bordered motion compensation(hereinafter, referred to as xe2x80x9cMCxe2x80x9d) BAB in the size 16xc3x9716 of BAB. As shown, the frame type BAB includes the bordered MC BAB in the size 16xc3x9716 of BAB. The interior of the solid line indicates the BAB(16xc3x9716) to be coded. The field type BAB as shown in FIG. 4A is divided into the top and bottom fields having the size 8xc3x9716. In the corresponding top field, the 8xc3x9716 block brings the bordered data from the top field, and in the corresponding bottom field, the block brings the bordered data from the bottom field. Thus, in the field mode of FIG. 4A the bordered MC BAB is divided into two fields. FIG. 4B is a view illustrating a bordered current BAB of the size 16xc3x9716 of BAB. The field type BAB as shown in FIG. 4B is divided into the top and bottom fields having the size 8xc3x9716. In the corresponding top field, the 8xc3x9716 block brings the bordered data from the top field, and in the corresponding bottom field, the block brings the bordered data from the bottom field. Thus, in the field mode of FIG. 4B the bordered current BAB is divided into two fields. That is, the field type block of FIG. 4B brings the bordered data corresponding only to the top border and left border from each of the top and bottom fields.
As known, the above-discussed method is already disclosed in the MPEGxe2x80x944 Visual CD.
FIG. 5 is a block diagram illustrating configuration of a representative object-based video decoder. Shape information bitstream, motion information bitstream and texture information bitstream outputted from a demuliplexer 41 are respectively inputted to a shape information decoding unit 42, a motion information decoding unit 43 and a texture information decoding unit 47, to be thereby decoded to shape and motion forming an object area and to texture information of the interior of the object, respectively. The decoded signal in the motion information decoding unit 43 is inputted to a motion compensating unit 44, in which a motion compensating operation is performed by using the decoded signal. An object area signal is represented in an object area representing unit 45. The represented object area signal is inputted to a synthesizer 46 in which a plurality of the represented object area signals are synthesized to thereby represent an original video.
FIGS. 6A and 6B are views illustrating the comparison of construction in a frame unit and in a field unit.
White lines indicate the top fields and the gray lines indicates the bottom fields. In case of the progressive scanning video, coding efficiency is not greatly deteriorated when the coding is executed in the frame type block unit. Meanwhile, in case of the interlaced scanning video, since one frame is divided into two fields, the shape is formed in a complicated manner, as shown in FIG. 8A, to thereby decrease the coding efficiency, if the frame BAB is coded in the BAB unit.
FIGS. 7A to 8B show the problems caused in the interlaced scanning video. Firstly, FIGS. 7A and 7B are views illustrating the video having a little amount of motion in case of the presence of binary shape information, in which FIG. 7A shows a frame type MB and FIG. 7B shows a field type MB. The video is mainly displayed in a still image or the video having a little amount of motion. Change of the video shape between the two fields is little because time variation is greatly little. In this case, the coding of the shape information is preferable in the frame unit.
FIGS. 8A and 8B are views illustrating the video having a large amount of motion in case of the presence of binary shape information, in which FIG. 8A shows a frame type MB and FIG. 8B shows a field type MB. The video is mainly displayed in the picture having a large amount of motion. Change of the video shape between the two fields is serious because time variation is great. In this case, the coding of the shape information is preferable in the field unit. As shown in FIG. 8A, the variation of the BAB is not serious in the field unit block. At this time, in case of the interlaced scanning, since one frame is divided into two fields, when the frame BAB is coded in the BAB unit, the video difference between the two fields, as shown in FIG. 8A, as much as the size of motion is generated. In the case where the size of motion between the two fields is large, if the conventional frame coding method is embodied, there occurs a problem that a large number of the shape information coding bits are generated. If the number of the shape information coding bit is large, the data compression efficiency is decreased since a large amount of data should be transmitted. As known above, the determination of the MVPs is important by the following two reasons: firstly, with the accurate determination of MVPs, the size of the MVDs can be reduced and the number of bit generated can be also decreased; and secondly, with the accurate determination of MVPs, the MVs can be accurately obtained and accordingly, with the motion compensation of the BAB, the size of shape information bitstream can be decreased. In conclusion, if the motion vector is accurately estimated and compensated upon coding of the shape information, the MVPs can be accurately obtained, which will be advantaged in a coding gain respect. However, the conventional MVPs determining method as mentioned above does not reflect the characteristic of the interlacing scanning video. In other words, the conventional MVPs determining method can not be embodied in the case where the BAB or MB performs motion vector estimation, compensation and encoding, in a field unit.
Accordingly, the present invention is directed to a shape information coding device for interlaced scanning video and method therefor that substantially obviates one or more of the problems due to limitations and disadvantages of the related art.
An object of the invention is to provide a shape information coding device for interlaced scanning video and method therefor which can detect an amount of motion of object video, on coding of interlaced scanning video, and encode the object video in a frame unit or in a field unit in accordance with the detected result.
Another object of the invention is to provide a shape information coding device for interlaced scanning video and method therefor which determines a motion prediction mode of a binary alpha block(BAB) and separates a field predicted BAB and a frame predicted BAB in accordance with the determined motion prediction mode to thereby perform motion vector prediction, upon motion estimation for coding motional video.
According to an aspect of the present invention, a shape information coding method for interlaced scanning video includes the steps of: determining a motion prediction mode of a BAB in a coding mode obtained by using a motion estimation value of the BAB; if the determined motion prediction mode corresponds to a field mode, performing motion compensation in a field type block unit and coding shape information in the field type block unit; and if the determined motion prediction mode corresponds to a frame mode, performing motion compensation in a frame type block unit and coding the shape information in the frame type block unit.
According to another aspect of the present invention, a shape information coding method for interlaced scanning video includes the steps of: determining a motion prediction mode of a BAB in a coding mode obtained by using a motion estimation value of the BAB; if the determined motion prediction mode corresponds to a frame mode, performing motion prediction in a frame type block unit and coding shape information in the frame type block unit; and if the determined motion prediction mode corresponds to a field mode, performing the motion prediction in the frame type block unit and in a field type block unit, re-determining the motion prediction mode, coding additional information for the prediction mode and coding the shape information in the field type block unit.
According to still another aspect of the present invention, a shape information coding device for interlaced scanning video includes: a shape information estimating means for determining a motion prediction mode by shape information motion information estimated from binary shape information inputted, comparing the motion information with motion information of shape information or texture adjacent thereto to thereby determine an MVPs, and performing operations of the determined MVPs and the estimated shape information motion information to thereby calculate an MVDs; a shape information type determining means for determining a type of a BAB by the motion information obtained from the shape information motion estimating means; a shape information coding mode determining means for determining a coding mode of the shape information in accordance with an amount of motion variation of the BAB; a shape information motion compensating means for compensating motion of the inputted shape information in accordance with the determined motion prediction mode; a field additional information determining means for determining and coding additional information for a field block type and field discrimination in accordance with the coding mode information obtained in the shape information coding mode determining means and the type information of the BAB obtained in the shape information type determining means; and a shape information coding means for coding the binary shape information outputted from the shape information motion compensating means in a frame unit or in a field unit in accordance with the coding mode information obtained in the shape information coding mode determining means.
Preferably, the shape information coding means is comprised of: a frame coding means for coding the binary shape information in a frame mode, if the coding mode information obtained in the shape information coding mode determining means is frame mode information; a field coding means for coding the binary shape information in a field mode, if the coding mode information obtained in the shape information coding mode determining means is field mode information.
Preferably, the shape information motion estimating means is comprised of: a shape information frame/field prediction mode determining unit for inputting frame/field prediction flag and motion information of an adjacent shape information BAB to thereby determine whether the prediction mode of the shape BAB adjacent to the current BAB is a frame predicted mode or a field predicted mode; an adjacent shape information motion vector extracting unit for outputting motion vector of the shape BAB adjacent to the current BAB to the shape information frame/field prediction mode determining unit; an MVPs order determining unit for inputting the information on whether the shape BAB adjacent to the current BAB is in the frame predicted mode or the field predicted mode from the shape information frame/field prediction mode determining unit to thereby determine an order of the MVPs; a final MVPs determining unit for determining a final MVPs with the MVPs order determining unit and a texture motion vector predictor obtained in a texture motion vector predictor order determining unit; and an MVDs determining unit for determining an MVDs from a difference value between the final MVPs and the estimated shape information motion information.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.