1. Field of the Invention
The present invention generally relates to systems creating a virtual collaboration environment and more particularly to electronically-mediated systems for enhancing effective collaboration among participants.
2. Description of the Related Art
Co-located people collaborate by talking and showing documents to each other. Collaboration tools are used to facilitate a similar interaction between people separated by long distances. Existing collaboration tools are not very effective at achieving this goal.
Altom et al. U.S. Pat. No. 5,627,978, issued May 6, 1997, discloses a graphical user interface for multimedia call setup and call handling in a virtual conferencing system. This patent describes a system and method for presenting participants in an electronically mediated collaboration session. It presents users appearing to come and go from the conference, and the number of participants. The system is limited, however, as it does not couple the audio component to visual indicators as to which speaker is talking or implement shared applications and cursors.
Existing tools which link users via the Internet TCP/IP protocol possess inadequate audio fidelity, an absence of spatialized audio, limited application sharing and a limited number of allowable participants. Additionally, a significant missing feature of existing related art is the ability to effectively facilitate eye contact behavior.
Eye contact or gaze behavior is such an important component of human nonverbal communication that it is the subject of countless studies by scientists from many disciplines, ranging from psychology to anthropology. Mutual gaze is an established metric for communication quality in the experimental practices across disciplines. Clearly, using a communication system which degrades the participants"" ability to engage in natural mutual gaze behavior will be less effective than face-to-face communication. Any system capable of supporting high-quality human interaction in virtual collaboration systems, approaching (or even surpassing) that of ordinary, co-located interactions, must be able to support natural mutual-gaze behavior. Video-based teleconferencing technologies in general, fail to support eye contact between participants. When a video camera produces images from a fixed location and transmits these images to other participants, the images are displayed from the point of view of the camera. To appear to make eye contact with a viewer who receives an image from a given camera, the participant being recorded would need to look directly at the camera and not at the eyes of the video representation of the remote viewer. Thus, the only way to support eye contact in such a system would be to make the camera coincident with the eyes of the video representation of the remote participant.
Conventional video teleconferencing, network-based desktop video teleconferencing and more recent video-based telecollaboration systems all suffer from the limitation that a participant gazing at the eyes of the video representation of another participant appears to that participant to be gazing away to a point in space. This effect corresponds to the geometric mismatch between the camera position and that of the participant""s eyes in the video representation. In a typical desktop setup, the camera is mounted on the top edge of the monitor producing a view to the remote viewer of the participant appearing to look down, as at the desktop or keyboard. In fact, the participant is actually making eye contact with the video representation of the remote viewer displayed on the participant""s monitor.
Previous attempts to develop systems which support eye contact were based upon developing a virtual camera, through optical components (half-silvered mirrors, etc.) and configuring it at a predetermined gaze location. A major limitation of such systems is that the virtual camera location is fixed by the geometric arrangement of the participants"" viewing stations. Even slight movement of the participants causes the mutual gaze effect to break down in these systems. Further, such systems do not easily scale to multiple remote participants. Even if such previous approaches are extended by mechanizing the cameras and tracking participants"" heads to allow repositioning of the cameras as the participants"" head moves, such an approach would be inferior to the approach described here in many respects. Most importantly, lags in camera motion may be discernible and the system would not scale well to multiple users.
With respect to eye contact collaboration, some related art includes video teleconferencing technology, picture telephone technology, desktop video telecollaboration technology and video collaboration technology. Only one of these technologies, the approximately 20-year-old picture phone which supported only one-to-one conversation, produced some level of natural eye contact.
The deleterious effects of noise due to limited bandwidth on the recognition of speech sounds have been long known. See Miller, G.A. and Nicely, P.E. (1955), xe2x80x9cAn analysis of perceptual confusions among some English consonants,xe2x80x9d Journal of the Acoustical Society of America 27, 338-352. Recently, improved error measures for recognition of natural speech over toll-quality telephone provide a method of testing high-fidelity speech transmissions. See Spiegel, M. F. et al. (1990), xe2x80x9cComprehensive assessment of the telephone intelligibility of synthesized and natural speech,xe2x80x9d Speech Communication 9, 279-291. These studies demonstrate the need for improved sound quality beyond that provided by toll-quality telephone service to enhance communication.
The present invention enables people at geographically dispersed sites to collaborate more effectively (for certain types of sessions) than even a face-to-face meeting, by organizing and effectively presenting the audio and visual content of a collaboration session. In addition, the system may optionally include eye contact collaboration to further facilitate effective communication.
The features of this invention represent a novel and effective collaboration environment. The invention allows users running coordinating applications on their computers to collaborate in various modes, for example, by conversing via high fidelity audio, or by sharing personal data and documents, or by using shared applications.
The invention includes an array of audio and visual collaboration enhancements. Each participant""s speech is captured at high fidelity and multicast to other participants. Participants may also hold conversations on participant-selectable channels, including private channels. Audio can optionally be xe2x80x9cspatializedxe2x80x9d to make participant voices appear to be located at the image of the speaker. Also, visual representations of each participant include indicators for identifying when a participant is talking. In addition, participant data or other information may optionally be displayed. The application can support multiple users, limited only by the audio processing hardware and space on the monitor.
The invention also encompasses a group of application sharing features. With shared applications, such as a shared whiteboard or browser, when one user makes a change to the application, all users see the change. The shared browser is particularly useful, as it can display any data available on the world-wide-web, and is innately capable of presenting data and controls from a wide variety of applications. Further, a shared cursor behaves like a shared application with some indicator as to which user is currently controlling it. Alternately, a number of cursors may be presented, optionally displayed by each user.
One embodiment of the present invention enables participants in a virtual collaboration session, using video representations, to make eye contact with other participants. This part of the invention consists of a system of display devices, cameras, optical components, computers, optional sensors, and image processing and control software. Each participant or viewer uses a special display device which presents video images of the other participants arranged in a viewer-selectable pattern within the displayed field of view, determining each participant""s virtual location as seen by every viewer. Multiple video cameras at each participant""s physical location create discrete images of the participants. Each viewer""s physical position with respect to the display components and cameras is determined by geometric analysis of the physical arrangement of the space and/or by optional tracking sensors which determine head position.
The present eye contact collaboratory subsystem determines the location of each participant""s eyes in the displayed video images with respect to every other participant and computes an appropriate image transformation to place a virtual camera coincident with the location of each participant""s eyes in the displayed representation. Then, using discrete video images from multiple cameras, the system synthesizes an appropriate image for each virtual camera and transmits the synthesized images to the corresponding participants via a communication network where the image is presented on each participant""s display system.