In many videoconferencing applications, bandwidth is at a premium, and thus, it is important to encode a given video frame—as well as the entire video stream—as efficiently and intelligently as possible. Video compression algorithms typically operate on macroblocks of data, that is, square-shaped groups of neighboring pixels within the video frame. Macroblocks are typically 16 pixels by 16 pixels in size, but, depending on the codec used to encode the video data, e.g., H.263 or H.264, the frame may also be broken down into smaller macroblocks, e.g., macroblocks that are 4, 8, 12, or 16 pixels to a side in size. Of course, the video frame may also be broken down into smaller or larger macroblocks of any size.
In videoconferencing applications, it is paramount to ensure the highest possible video quality at the lowest possible bit rate. Existing video codec standards today employ both inter-coding and intra-coding techniques. Intra-coding techniques are performed relative to information that is contained only within the current video frame and not relative to any other frame in the video sequence. Intra-coding takes advantage of the fact that the human eye does not perceive very small differences in color as easily as it can perceive changes in brightness. Inter-coding techniques, on the other hand, involve temporal processing, that is, rather than resending all the information for a subsequent video frame, the codec will only encode and send the changes in pixel location and pixel values from one video frame to the next video frame. This is an effective technique because, often, a large number of the pixels will not change from one video frame to the next, and it would be redundant to resend all the image information with each video frame.
During encoding, video compression codecs have to make critical decisions regarding where to “spend” their limited number of bits for each video frame, i.e., the codec must determine upon which macroblocks the most amount of image detail is needed. It is often desirable that a larger amount of information be spent encoding the most important parts of the video frame, whereas the less important parts of the video frame can be compressed at higher rates and still produce satisfactory video quality. In videoconferencing applications in particular, the most important parts of the video frame are usually where the human face or hands are located in the video frame and, more specifically, where the facial features involved in communication, such as the eyes and mouth, are located.
Thus, there is need for an apparatus, computer readable medium, processor, and method for intelligent skin tone and facial feature aware videoconferencing compression that can “suggest” macroblock compression ratios to the video encoder. The suggestion of compression rates can be based at least in part on which macroblocks in a given video frame are likely to be the most important, and thus deserving of having a disproportionately greater number of bits spent in their encoding than in the encoding of the less important macroblocks in the video frame.