The present invention generally relates to object acquisition and tracking techniques and, more particularly, to real-time head tracking techniques performed in a mutually-immersive environment.
Telepresence systems allow a local user to view a remote location (i.e. a conference room) as if they were present at the remote location. Mutually-Immersive telepresence system environments allow the local user to interact with individuals present at the remote location. In a mutually-immersive environment, the local user sits inside a display area, with a projection surface surrounding the local user outside of the display area; thereby, surrounding (or immersing) the local user. Cameras are positioned about the display area to collect images of the local user. In application, live color images of the local user are acquired by the cameras and subsequently transmitted to the remote location, concurrent with projection of live video from the remote location on the projection surfaces surrounding the local user. The local user is able to move about inside the display area; thus algorithms are needed to track the head position of the local user.
Conventional head tracking methods include generating a representation of a user""s head based on the detection of the user""s eyes or other facial features. An example of such a method would be to use the retro-reflectivity property of the human eye, when illuminated by light, to detect and track head position. A drawback associated with such an approach is that the head of the person being tracked must always be facing a camera. If the user turns away from the camera, eye reflectivity can no longer be detected. Thus, head position tracking cannot be accurately maintained.
Another conventional head tracking method calls for the local user to wear tracking hardware, such as, used for motion capture in computer graphics, and to transform the position information obtained from the tracking hardware into a bounding box image based on the lens focal length of the tracking hardware and the particular system geometry. A drawback associated with this approach is that the local user would have to be burdened with wearing the oftentimes cumbersome tracking hardware. Also, the tracking hardware degrades final image quality, as the tracking hardware would be visible in any resulting image.
A drawback of specific relevance to mutually-immersive environments, as described above, is that the projected views that surround the local user often contain images of other people from the remote location. Consequently, differentiating the local user""s head from those projected from the remote location becomes difficult. A known approach used to distinguish the head of the local user from the projected heads is to subtract the projected video images from still images acquired from the local cameras; this is commonly referred to as difference keying. The synchronization between the projected video images and the acquired still images can be tricky, however, due to delays caused by various system components. In addition, difference keying is computationally expensive since the video images are large (on average 720xc3x97480) and must be warped and manipulated (e.g. subtracted) in real time.
The aforementioned and related drawbacks associated with conventional head tracking methods are substantially reduced or eliminated by the head tracking technique of the present invention. The present invention is directed to using luminance keying as a head tracking technique for use in conjunction with a mutually immersive telepresence environment. The head of the local user is tracked in real time by uniformly illuminating a rear projection screen that surrounds a display cube with light having a wavelength in the near-infrared spectrum. A near-infrared image of the head of the local user is acquired by a near-infrared camera equipped with visible-cut near-infrared pass filters that discern the difference between the illuminated rear projection screen, representing the background, and any foreground illumination. A color image of the head of the local user, and any color images on the rear projection are acquired by a color camera. A bounding box is then provided around the head of the local user in the near-infrared image. This bounding box image is then translated to the view space of the color camera. The translated image is then used to crop the color image, which is then transmitted to a remote location.
In application, the local user is placed within a display cube. Each side of the display cube is covered with a projection screen. Thus, the projection screen is always positioned substantially behind the local user. A plurality of near-infrared illuminators are positioned behind the projection screen. The near-infrared illuminators provide near-infrared light uniformly against the projection screen. A projector, which is adapted not to emit light in the near-infrared spectrum, is positioned about the near-infrared illuminators behind the projection screen. The projector provides a video image of the remote location on the projection screen. A camera unit, including a stacked color camera and a near-infrared camera, is positioned at the corners of the display cube. In an alternate embodiment, the camera unit is located in front of the local user. The near-infrared camera detects any luminance differences between an object located within the display cube relative to the luminance value of the projection screen. According to the present invention, such object is considered to be the head of the local user. This is referred to as luminance keying.
The color camera detects the color image of the remote location projected onto the projection screen by the projector and the local user located in front of the screen. The images detected by the near-infrared and color cameras are then transferred to a processor. The processor performs a bounding box process on the pixels that represent the local user""s head in the near-infrared image. The processor then translates the bounding box to the view space of the color camera, then crops the color images based on the translated bounding box. This cropped, color version of the local user""s head in front of the projected image is then transmitted to the remote location.
An advantage of the present invention is that it provides the ability to distinguish a locally present object from projected images of objects at remote locations.
Another advantage of the present invention is that distinguishing between local objects and remote objects is performed in real time.
A feature of the present invention is that it is economical and straightforward to implement.