When an image or scene is captured on a camera or provided on some other electronic device or computer as a digital image, it can be desirable to modify the image in ways that require the device to first segment the foreground of the image from the background. For example, a user may want to change the background in the image for entertainment reasons, practical reasons such as to replace the background of a person speaking in a video conference to provide a background more appropriate or less distracting for business purposes, or artistic reasons. The background-foreground segmentation also may be used for computer vision, object recognition, medical imaging, video coding efficiency, and others.
One conventional way to segment the foreground from the background in an image is to only use the color data of the pixels in the image. These methods, however, are often very computationally heavy and time consuming since they are performed on a pixel by pixel basis with very large resolutions throughout an entire area on the image to be separated into background and foreground. The color-based segmentation can also be inaccurate when the background and foreground have the same colors or when a color pattern has strong color differences that are mistakenly and undesirably split into different segments.
Other conventional background-foreground segmentation systems use depth data provided by a camera to take advantage of the depth resolution that is smaller than the color resolution. Specifically, many image capture devices also have 3D or depth sensing cameras (such as a RGBD cameras) that can form a 3D space of a scene. This is typically performed by using multiple cameras and triangulation algorithms or other known methods to generate a depth image from a single camera. The depth image or depth map has a much smaller resolution compared to the pixel's color data. The conventional background-foreground segmentation uses a weighted combination of the color and depth data of the pixels to determine whether the pixels are part of the background or the foreground. This combination, however, can be found to be inconsistent and inaccurate since it is an artificial combination where a clear relationship between color and depth does not necessarily exist.