Automatic region-of-interest (ROI) video object segmentation may be useful for a wide range of multimedia applications that utilize video sequences. An ROI object may be referred to as a “foreground” object within a video frame and non-ROI areas may be referred to as “background” areas within the video frame. ROI object segmentation enables selected foreground objects of a video sequence that may be of interest to a viewer to be extracted from the background of the video sequence. Multimedia applications may then preferentially utilize the ROI object segmented from the video sequence. Typical examples of an ROI object are a human face or a head and shoulder area of a human body.
In video surveillance applications, for example, an ROI object segmented from a captured video sequence can be input into a facial database system. The facial database system may use the segmented ROI object, e.g., a human face, to accurately match with target face objects stored within the database. Law enforcement agencies may utilize this application of ROI object segmentation to identify suspects from surveillance video sequences.
As another example, in video telephony (VT) applications an ROI object segmented from a captured video sequence can be input into a video sequence encoder. The video sequence encoder may allocate more resources to the segmented ROI object to code the ROI object with higher quality for transmission to a recipient. VT applications permit users to share video and audio information to support applications such as video conferencing. In a VT system, users may send and receive video information, only receive video information, or only send video information. A recipient generally views received video information in the form in which it is transmitted from a sender. With preferential encoding of the segmented ROI object, a recipient is able to view the ROI object more clearly than non-ROI areas of the video sequence.
Other examples include video broadcasting applications in which a person presents informational video such as a live or prerecorded news or entertainment broadcast. In such applications, it may be desirable to preferentially encode an ROI object corresponding to the face of a human presenter, such as a news reporter or talk show host.
Conventionally, automatic ROI object segmentation focuses on motion analysis, motion segmentation and region segmentation. In one case, a statistical model-based object segmentation algorithm abstracts an ROI object into a blob-based statistical region model and a shape model. Thus, the ROI object segmentation problem may be converted into a model detection and tracking problem. In another case, a foreground object may be extracted from a video frame based on disparity estimation between two views from a stereo camera setup. A further case proposes a ROI object segmentation algorithm that includes both region-based and feature-based segmentation approaches. The algorithm uses region descriptors to represent the object regions, which are homogeneous with respect to the motion, color and texture features, and tracks them across the video sequence.