In several use cases it is desirable to be able to capture a person with a camera, and visualize him/her as a part of completely different scene. One example of such use case is a meteorologist captured in front of green screen and digital weather map, both of which are composited together in real-time for TV broadcasting. Current solutions for composting different image sources together in real-time only work well when perspective and viewpoint of the different image sources are nearly identical. This significantly restricts the types of visual material which can be mixed together, thus limiting the use cases and experiences which can be provided.
Current solutions are done with a chroma keying approach: elements to be composited are captured in front of a distinctively colored background such as green or blue cloth, which is then easy to segment and remove from the captured image. In an alternative approach, direct background subtraction is used with the assumption that elements are captured in front of a static background. Static background and dynamic element may be separated according to the temporal nature of the pixels. To simplify, in background subtraction, pixels that represent objects of interest change over time whereas pixels representing background remain static. Both chroma keying and background subtraction work only well for composition to other image sources that have very similar perspective and viewpoint, and occlusions between image sources are ignored.
There are attempts to improve real-time composition with the use of RGB-D sensors. The approach in these examples is quite different: the solution is based on capturing RGB-D sensor data, using it to reconstruct full 3D model of the elements seen by the sensor, and then rendering reconstructed 3D elements from different viewpoint. These solutions tend to suffer from technical complexity, as several RGB-D sensors are needed in order to achieve complete enough 3D reconstruction to allow changing of the viewpoint. Also, the image quality resulting from 3D rendering of reconstructed 3D model tends to be sub-optimal compared with the image quality achieved by using the data captured by the RGB camera of the RGB-D sensor alone.