Automatic region-of-interest (ROI) detection within video frames of a video sequence may be used in ROI video processing systems for a wide range of multimedia applications, such as video surveillance, video broadcasting, and video telephony (VT) applications. In some cases, a ROI video processing system may be a ROI video coding system. In other cases, a ROI video processing system may comprise a ROI video enhancing system or another type of video processing system. A ROI may be referred to as a “foreground” area within a video frame and non-ROI areas may be referred to as “background” areas within the video frame. A typical example of a ROI is a human face. A ROI video processing system may preferentially utilize a ROI detected from a video frame of a video sequence relative to non-ROI areas of within the video frame.
In the case of a ROI video coding system, preferential encoding of a selected portion within a video frame of a video sequence has been proposed. For example, an automatically detected ROI within the video frame may be encoded with higher quality for transmission to a recipient in a video telephony (VT) application. In very low bit-rate applications, such as mobile VT, ROI preferential encoding may improve the subjective quality of the encoded video sequence. With preferential encoding of the ROI, a recipient is able to view the ROI more clearly than non-ROI regions. A ROI of a video frame may be preferentially encoded by allocating a greater proportion of encoding bits to the ROI than to non-ROI, or background, areas of a video frame. Skipping of a non-ROI area of a video frame permits conservation of encoding bits for allocation to the ROI. The encoded non-ROI area for a preceding frame can be substituted for the skipped non-ROI area in a current frame.
Video frames received from a video capture device are typically processed before being applied to an ROI-enabled video encoder, an ROI-enabled video enhancer, or a similar multimedia device. For example, a video processing scheme may automatically detect a ROI within the video frames. Conventionally, a major hurdle preventing rapid progress and wide deployment of ROI-enabled video communication systems is robustness of the automatic ROI detection. Some automatic ROI detection schemes propose a simple skin-tone based approach for face detection that detects pixels having skin-color appearances based on skin-tone maps derived from the chrominance component of an input video image. Other schemes propose a lighting compensation model to correct color bias for face detection. Additionally, automatic ROI detection schemes may construct eye, mouth, and boundary maps to verify the face candidates or use eigenmasks that have large magnitudes at important facial features of a human face to improve ROI detection accuracy.