1. Field of the Invention
The invention is related to video processing. More particularly, the invention is related to detection of occlusion regions in video data.
2. Description of the Related Art
Multimedia processing systems, such as video encoders, may encode multimedia data using encoding methods based on international standards such as MPEG-x and H.26x standards. Such encoding methods generally are directed to compressing the multimedia data for transmission and/or storage. Compression is broadly the process of removing redundancy from the data. In addition, video display systems may transcode or transform multimedia data for various purposes such as, for example, to ensure compatibility with display standards such as NTSC, HDTV, or PAL, to increase frame rate in order to reduce perceived motion blur, and to achieve smooth motion portrayal of content with a frame rate that differs from that of the display device. These transcoding methods may perform similar functions as the encoding methods for performing frame rate conversion, de-interlacing, etc.
A video signal may be described in terms of a sequence of pictures, which include frames (an entire picture), or fields (e.g., an interlaced video stream comprises fields of alternating odd or even lines of a picture). A frame may be generally used to refer to a picture, a frame or a field. Multimedia processors, such as video encoders, may encode a frame by partitioning it into blocks or “macroblocks” of, for example, 16×16 pixels. The encoder may further partition each macroblock into subblocks. Each subblock may further comprise additional subblocks. For example, subblocks of a macroblock may include 16×8 and 8×16 subblocks. Subblocks of the 8×16 subblocks may include 8×8 subblocks, and so forth. Depending on context, a block may refer to either a macroblock or a subblock, or even a single pixel.
Video sequences may be received by a receiving device in a compressed format and subsequently decompressed by a decoder in the receiving device. Video sequences may also be received in an uncompressed state. In either case, the video sequence is characterized at least by a frame rate, and a horizontal and vertical pixel resolution. Many times, a display device associated with the receiving device may require a different frame rate and/or pixel resolution and video reconstruction of one or more video frames may be performed. Reconstruction of video frames may comprise estimating a video frame between two or more already received (or received and decompressed) video frames. The reconstruction may involve techniques known as motion estimation and motion compensation. Matching portions of video frames between two or more already received (or received and decompressed) frames are identified along with a motion vector that contains the relative locations of the matching blocks in the process of motion estimation. These matching blocks and motion vectors are then used to reconstruct portions of the intermediate frame by the process of motion compensation. Frame rate conversion, de-interlacing and transcoding are examples of processes where decoder devices create new video data based on already reconstructed video data. In addition, these motion compensation techniques can use encoded data, such as motion vectors and residual error, as well as the reconstructed video data for estimating the newly created frames.
Occlusions occurring in a video sequence present a problem to any motion estimation/compensation algorithm. Occlusions include, for example, the covering of one object (background is considered as an object) by another, and the uncovering of one object due to motion of another. Typically, the motion vectors estimated in the vicinity of occlusion areas are incorrect and using these motion vectors directly for motion compensation causes visual artifacts. One of the important steps in solving this problem is the identification of occlusion areas in a video frame and subsequent classification into covering and uncovering areas. Many approaches that have been suggested for the solution of this problem suffer from one or more drawbacks including high computational complexity, poor accuracy and localization, and insensitivity to the actual desired interpolation phase between two frames.