An interactive imaging experience includes an environment in which an interactive display is affected by the motion of human bodies, objects, or the like. A camera, or set of cameras, detects a number of features of the human bodies before the camera, such as their silhouettes, hands, head, and direction of motion, and determines how these features geometrically or photometrically relate to the visual display. For example, a user interacting before a front-projected display casts a shadow on an optional display medium such as a projection screen, or the like. The interactive imaging system is capable of aligning the camera's detection of the silhouette of the human body with the shadow of the human body. This geometric or photometric alignment creates a natural mapping for controlling elements in the visual display. Persons of all ages can likely recall an experience of playing with their shadows and can thus understand that their motion in front of a source of bright light will produce a shadow whose motion behaves exactly as expected. This experience is capitalized upon in an interactive imaging experience.
In order for interactive imaging systems to operate and function properly, such systems must be accurately calibrated and optimized first. Procedures exist under which the motion of the human body, or the like, is geometrically or photometrically aligned to the actual visual display, creating a natural mapping for use in an interactive imaging system. However, these interactive imaging devices and systems require an extensive period of time, often taking many hours, for calibration and initialization. Such a delay results in long periods of wait time with no use of the interactive imaging system upon setup, until such time the calibration period is completed. This is equivalent to powering on a personal computer, expecting to use it immediately, yet waiting for hours before actual use can begin. Thus, such methods of calibration in an interactive imaging system are not automatic and nearly instantaneous, as is desired.
Calibration in an interactive imaging system refers to the initialization and setting of various setup parameter values. These parameter values, once initialized, are used in various segmentation algorithms. Segmentation, generally, has to do with image processing. Segmentation is a technique concerned with splitting up an image, or visual display, into segments or regions, each segment or region holding properties distinct from the areas adjacent to it. This is often done using a binary mask, representing the presence of a foreground object in front of the visual display surface.
A conceptual example of this definition of segmentation is the image formed on an all-white front-projected visual display when a person, or the like, is placed in front of the visual display and casts a shadow upon it. In this example, only the black or shadowed region of the visual display, as viewed on a wall, projection screen, or the like, denotes the presence of a foreground element, a body or similar object, and the white color in the visual display denotes background or non-presence of a foreground object. Normally, however, this segmentation is a binary image representation that is computed using a monochrome camera input.
There are a number of segmentation techniques, or algorithms, which are already well-known in the art. Two of these segmentation techniques include background subtraction and stereo disparity-based foreground detection, both of which may be employed for generating a segmentation image.
All of these algorithms share the need to set parameters which affect the quality of the segmentation as defined by its similarity to ground truth and as defined by its speed of execution. Calibration is the process of setting these parameters in order to achieve high quality in a visual display while operating at an acceptable execution speed. Unfortunately, existing calibration methods in interactive imaging systems require too much time for actual calibration and optimization. Such time requirements produce unsuitable delays.
A common approach for generating segmentation images from a camera that faces a visual display is to filter the camera to observe only near-infrared light while ensuring that the display only emits visible, non-infrared light. By separating the sensing spectrum from the display spectrum, the problem is reduced from detecting foreground elements in a dynamic environment created by a changing display to the problem of detecting foreground elements in a static environment, similar to chroma-key compositing systems with green or blue screens.
Background subtraction is the most popular means of detecting foreground elements (segmentation) for real-time computer vision applications. A model of the background, B, is maintained over time and is usually represented as an image with no foreground elements. It is assumed that the camera can view the entire area covered by the visual display; however, it is not assumed that the boundaries of the camera align exactly with the boundaries of the visual display. Therefore, any image captured by the camera, including the background model, must be warped such that the boundaries of the visual display and warped image do align. Warping is performed by defining four coordinates in the camera image C1, C2, C3, and C4, and bilinearly interpolating the pixel values that are enclosed by a quadrilateral whose corners are defined by C1, C2, C3, and C4, As a result, the warped camera geometrically corresponds to the display. A method for automatically computing these coordinates in the camera using homographies was presented in R. Sukthankar, R. Stockton, M. Mullin. Smarter Presentations: Exploiting Homography in Camera-Projector Systems. Proceedings of International Conference on Computer Vision, 2001. (A homography is a 2D perspective transformation, represented by a 3×3 matrix that maps each pixel on a plane such as a camera's image plane to another plane, such as a projector's image plane, through an intermediate plane, such as the display surface.) This method, however, assumes that the display may be viewed by the camera and the camera whose image needs to be warped is infrared-pass filtered, therefore eliminating the visibility of the display. Additionally, an automatic camera-camera homography estimation method was disclosed by M. Brown and D. G. Lowe in Recognising Panoramas. In Proceedings of the 9th International Conference on Computer Vision (ICCV2003), pages 1218-1225, Nice, France, October 2003.
While these patents and other previous systems and methods have attempted to solve the above mentioned problems, none have provided an auto-calibrating interactive imaging system and a method by which the interactive imaging system is initialized and automatically calibrated by optimizing the parameters of a segmentation algorithm using an objective function. Thus, a need exists for a system and methods of calibration and use in an interactive imaging system in which the calibration of parameters for segmentation algorithms is completed at an acceptable execution speed, and in which there is no deterioration in the quality of the visual display images.