The term “telepresence” generally refers to technologies that enable activities such as remote manipulation, communication, and collaboration. More specifically, telepresence refers to commercial video teleconferencing systems and immersive collaboration between one or more participants located at multiple sites. In a collaborative telepresence system, each user needs some way to perceive remote sites, and in turn be perceived by participants at those sites. The subject matter described herein focuses on how a user is seen by remote participants.
There are numerous approaches to visually simulate the presence of a remote person. The most common is to use 2D video imagery which may include capturing imagery of a subject using a single video camera and displaying the imagery on 2D surface. However, 2D imagery presented in this way lacks a number of spatial and perceptual cues. These cues can be used to identify an intended recipient of a statement, convey interest or attention (or lack thereof), or to direct facial expressions and other non-verbal communication. In order to convey this information to specific individuals, each participant must see the remote person from his or her own viewpoint.
Providing distinct, view-dependent imagery of a person to multiple observers poses several challenges. One approach is to provide separate track and multiplexed views to each observer, such that the remote person appears in one common location. However, approaches involving head-worn displays or stereo glasses are usually unacceptable, given the importance of eye contact between all (local and remote) participants. Another approach is to use multi-view displays. These displays can be realized with various technologies and approaches, however, each has limitations that restrict its utility as illustrated in the following list.
Another approach is to use multi-view displays. These displays can be realized with various technologies and approaches, however each has limitations that restrict its utility, as illustrated in the following list.                “Personal” (per-user) projectors combined with retroreflective surfaces at the locations corresponding to the remote users [16, 17]. Limitations: no stereo; each projector needs to remain physically very close to its observer.        Wide-angle lenticular sheets placed over conventional displays to assign a subset of the display pixels to each observer [13, 21]. Limitations: difficult to separate distinct images; noticeable blurring between views; approach sometimes trades limited range of stereo for a wider range of individual views.        High-speed projectors combined with spinning mirrors used to create 360-degree light field displays [11]. Limitations: small physical size due to spinning mechanism; binary/few colors due to dividing the imagery over 360 degrees; no appropriate image change as viewer moves head vertically or radially.        
One example domain to consider is Mixed/Augmented Reality-based live-virtual training for the military. Two-dimensional (2D) digital projectors have been used for presenting humans in these environments, and it is possible to use such projectors for stereo imagery (to give the appearance of 3D shape from 2D imagery). However there are difficulties related to stereo projection. Time/phase/wavelength glasses are possible from a technology standpoint—they could perhaps be incorporated into the goggles worn to protect against Special Effects Small Arms Marking System (SESAMS) rounds. However it is currently not possible (technologically) to generate more than two or three independent images on the same display surface. The result will be that multiple trainees looking at the same virtual role players (for example) from different perspectives would see exactly the same stereo imagery, making it impossible to determine the true direction of gaze (and weapon aiming) of a virtual character.
In fact there are two gaze-related issues with the current 2D technology used to present humans. In situations with multiple trainees for example, if a virtual role player appearing in a room is supposed to be making eye contact with one particular trainee, then when that trainee looks at the image of the virtual role player it should seem as if they are making eye contact. In addition, the other trainees in the room should perceive that the virtual role player is looking at the designated trainee. This second gaze issue requires that each trainee see a different view of the virtual role player. For example, if the designated trainee (the intended gaze target of the virtual role player) has other trainees on his left and right, the left trainee should see the right side of the virtual role player, while the right trainee should see the left side of the virtual role player.
Perhaps the most visible work in the area of telepresence has been in theme park entertainment, which has been making use of projectively illuminated puppets for many years. The early concepts consisted of rigid statue-like devices with external film-based projection. Recent systems include animatronic devices with internal (rear) projection, such as the animatronic Buzz Lightyear that greets guests as they enter the Buzz Lightyear Space Ranger Spin attraction in the Walt Disney World Magic Kingdom.
In the academic realm, shader lamps, introduced by Raskar et al. [20], use projected imagery to illuminate physical objects, dynamically changing their appearance. The authors demonstrated changing surface characteristics such as texture and specular reflectance, as well as dynamic lighting conditions, simulating cast shadows that change with the time of day. The concept was extended to dynamic shader lamps [3], whose projected imagery can be interactively modified, allowing users to paint synthetic surface characteristics on physical objects.
Hypermask [26] is a system that dynamically synthesizes views of a talking, expressive character, based on voice and keypad input from an actor wearing a mask onto which the synthesized views are projected.
Future versions of the technology described herein may benefit from advances in humanoid animatronics (robots) as “display carriers.” For example, in addition to the well-known Honda ASIMO robot [6], which looks like a fully suited and helmeted astronaut with child-like proportions, more recent work led by Shuuji Kajita at Japan's National Institute of Advanced Industrial Science and Technology [2] has demonstrated a robot with the proportions and weight of an adult female, capable of human-like gait and equipped with an expressive human-like face. Other researchers have focused on the subtle, continuous body movements that help portray lifelike appearance, on facial movement, on convincing speech delivery, and on response to touch. The work led by Hiroshi Ishiguro [9] at Osaka University's Intelligent Robotics Laboratory stands out, in particular the lifelike Repliee android series [5] and the Geminoid device. They are highly detailed animatronic units equipped with numerous actuators and designed to appear as human-like as possible, also thanks to skin-embedded sensors that induce a realistic response to touch. The Geminoid is a replica of principal investigator Hiroshi Ishiguro himself, complete with facial skin folds, moving eyes, and implanted hair—yet still not at the level of detail of the “hyper-realistic” sculptures and life castings of (sculptor) John De Andrea [4], which induce a tremendous sense of presence despite their rigidity; Geminoid is teleoperated, and can thus take the PI's place in interactions with remote participants. While each of the aforementioned robots take on the appearance of a single synthetic person, the Takanishi Laboratory's WD-2 [12] robot is capable of changing shape in order to produce multiple expressions and identities. The WD-2 also uses rear-projection in order to texture a real user's face onto the robot's display surface. The robot's creators are interested in behavioral issues and plan to investigate topics in human-Geminoid interaction and sense of presence.
When building animatronic avatars, the avatar's range of motion, as well as its acceleration and speed characteristics, will generally differ from a human's. With current state-of-the art in animatronics, they are a subset of human capabilities. Hence one has to map the human motion into the avatar's available capabilities envelope, while striving to maintain the appearance and meaning of gestures and body language, as well as the overall perception of resemblance to the imaged person. Previous work has addressed the issue of motion mapping (“retargeting”) as applied to synthetic puppets. Shin et al. [23] describe on-line determination of the importance of measured motion, with the goal of deciding to what extent it should be mapped to the puppet. The authors use an inverse kinematics solver to calculate the retargeted motion.
The TELESAR 2 project led by Susumu Tachi [25, 24] integrates animatronic avatars with the display of a person. The researchers created a roughly humanoid robot equipped with remote manipulators as arms, and retro-reflective surfaces on face and torso, onto which imagery of the person “inhabiting” the robot is projected. In contrast to the subject matter described herein, these robot-mounted display surfaces do not mimic human face or body shapes. Instead, the three-dimensional appearance of the human is recreated through stereoscopic projection.
Accordingly, in light of these difficulties, a need exists for improved methods, systems, and computer readable media for conveying 3D audiovisual information that includes a fuller spectrum of spatial and perceptual cues.