Subtraction of the Pre-determined (Completely) Static Background Image
The common approach for the video segmentation problem is subtraction of the predetermined, completely static background (BG) image from the new image just captured; the remaining pixels after subtraction are labeled as the foreground. The predetermined static BG image is generated at the very beginning of the segmentation process by capturing several images of the background scene and taking an average of those images. Therefore, during the generation of the pre-determined static BG image, the user needs to make sure that no moving objects (including the user) are present in the scene. Also, when the camera is displaced from its original position, the static BG image must be generated again. Therefore, this is the color-based method.
Off-line Learning Based Approach with Depth Information
A state-of-art real-time video segmentation is the method provided by Microsoft®, and it is implemented for Microsoft Kinect® for Windows® version 2, which captures a color image and its depth field simultaneously. They first create the database of human body masks with names of the human body parts labeled (such as, head, neck, shoulder, chest, arm, elbow, hand, stomach, hip, leg, knee, and foot) and also with depth information of the human bodies for thousands of different postures. Every time a depth field is captured, their method scans through all the local area of the captured depth field, and see if there are any good matches for the human postures stored in the database. When a good match is found, because the human posture comes with its depth information, the method roughly knows the range of depth where the human body is present. Simply, binarizing the depth field using the depth range information provides the human object mask, and this is the depth-based method.