High-definition (HD) resolution video content has become the trend for emerging streaming video systems. Videoconferencing systems have recently incorporated HD video, and for such applications, low latency is particularly important. Furthermore, for streaming video systems such as video conferencing, real-time detection of human faces in the video sequence is desired to improve the application quality. With detected face regions, for example, the video coder can assign smaller, that is, finer quantization step sizes to frame blocks that are within a face region while higher, that is, coarser quantization step sizes are assigned to the remaining portion of the frame, which is expected to provide higher visual quality of the scene under the same bit rate.
While methods for face detection are known that require knowledge of the entire picture at the time of processing, a distributed architecture is preferred for low latency coding. In a distributed coding architecture, different parts of an image may be processed simultaneously in different distributed elements, so that the entire picture may not be available.
Above-referenced U.S. Provisional Patent Application No. 60/908,070 describes an apparatus for and a method of real-time face detection operative on a distributed video coding apparatus that includes a video divider to divide an input picture to parts for respective ones of a plurality of interconnected coding processors. Computationally demanding tasks are distributed among the multiple processors, and generate for the blocks in the parts block-level edge features, and block-level skin-tone color-segmented features. These features are block-processed at the granularity of blocks to detect head, e.g., face regions in the input picture. The inventors have found that the method described in U.S. 60/908,070 works successfully when the scene background is relatively clean and uniform. The method provided a tradeoff between detection performance and execution speed.
As the scene background becomes more complex, the performance of face detection methods typically degrades as these methods need to deal with the background portions of each frame of the video.