This specification relates generally to image processing, and particularly to video image processing.
The Internet provides access to a wide variety of resources, such as video files, audio files, pictures, business and personnel contact information, product information, maps, and news articles. Although textual data were prevalent during early days of the Internet, video file transfers and video communications are becoming more popular with the increase in bandwidth.
In video chat and other digital video applications, a computer must process significant quantities of data in real time to create a high-quality and temporally-accurate image. For instance, if a person or object in a video conference call is moving a significant amount and is positioned in front of a variety of articles that create a complex background image, both the computer transmitting the video frame data and the computer receiving the data are confronted with enormous data-processing tasks.
As the image most likely of interest in a video application is often the person or object that is exhibiting the most movement, it is desirable that this person or object appear in as high-quality a resolution as is possible given the computing restraints. Accordingly, it is often desirable to segment the video into background and foreground layers. Applications of segmented layers includes subtracting the background layer to present only the foreground layer, person, or object for analysis (including for video surveillance, for example), and/or replacing the background layer with an alternative background image or layer, among many other applications. For convenience to the user, it is desirable that the selection and subtraction of the foreground or background layer be performed automatically and updated in real-time without user intervention.
It is similarly desirable that gesture recognition be performed automatically and updated in real-time without user intervention. Gesture recognition often involves a multi-step process, including the real-time processing of video frames, segmentation of the frame into foreground components (e.g. hands, arms, etc.), tracking of those foreground components, and inferring an action from a sequence of foreground motions. Accordingly, gesture recognition is usually a computation-heavy and error-prone process. It is desirable that gesture recognition be performed automatically and in a computationally-efficient manner without user intervention.