The demand for improved communications between people that are separated by distance and time has greatly increased in the last twenty or thirty years. The introduction of the telephone facilitated audio communication between people that were physically separated. Voice mail extended this, allowing people separated by both time and distance to exchange audio communication.
While these innovations satisfy many communications needs, the amount of information that can be conveyed solely by audio communication between people separated by distance falls far short of the total amount of information transferred between people in a face-to-face meeting. This need was met by the introduction of video conferencing systems. In a typical video conferencing system, the goal is to provide a connection between remote sites such that a plurality of users or "conferees" can communicate with each other as easily as if they were in the same room and sitting at the same conference table.
One characteristic of human behavior, however, is that less than fifty percent of the average conferee's time in a normal conference situation is actually spent observing either the presenter of the information or the material they are presenting. The majority of the conferee's time is actually spent examining their own material, speaking to others or observing other participants, building silent consensus or disagreement.
Prior art video systems are not capable of displaying more than a single video stream during video conferencing. As a corollary to this, since multiple sessions are not supported, prior art video conferencing systems did not allow the user to participate simultaneously in separate conferences or offer them the control needed to manage such conferences.
When surrounded by a barrage of sensory input, people selectively direct their attention to individual perceptual events, choosing to focus on individual components of their visual and aural fields. In a real conference room setting, conferees can easily direct and focus their attention toward different inputs with a simple turn of the head or movement of the eyes. A satisfactory remote conferencing experience must allow each participant to focus their visual and aural attention in a manner that closely follows natural (in-person) usage. This factor is particularly significant when there is more than one source of information.
Where multiple conferees or conferences were involved, prior art video conferencing systems did not provide any level of control to the participants, but either (a) merged multiple video streams into a single one or (b) switched/routed the data in a predetermined configuration. Typically, this conference configuration was determined by a server (or "master") that controlled the switching network. The server was the ultimate determiner of the video that is viewed by each conferee and was responsible for generating a data stream tailored to the characteristics of each participating entity. When only a few conferees were involved, the approach worked tolerably, but when many video sources were involved, the situation proved to be inherently unsatisfactory for most users.
No algorithm for determining a selection between multiple video sources has been found to be universally acceptable. Examples of such algorithms include fixed view (no switching), time-based switching (in which each participant is displayed in succession) and even an algorithm that determines the loudest speaker and switches the video source such that everyone views the loudest speaker. The latter, of course, introduces problems into the conference environment because people become aware of how the algorithm works and raise their voices to force their image to appear on the screens of the other conferees, causing the conference to devolve into a shouting match.
In general, prior art did not allow the client conferees to tailor either the logical or graphical structure of the conference to their needs and there is no universally acceptable algorithm that has been implemented at the server level for determining what a conferee would want to see. This significantly affected the attractiveness and, correspondingly, the acceptance of such systems.
A second problem with conventional video conferencing was that it did not allow the conferees to bridge the limitations of time as well as of distance. Conventional communications tools have solved this problem with the introduction of automated answering attendants, call forwarding, voice mail and electronic written mail. Parallel features were not included in prior art video conferencing systems, although communication across time as well as across distance is a critical factor to many users, particularly those who are so geographically separated as to be in different time zones.
Another problem arises from the fact that most prior art video conferencing systems were designed to support conferencing between parties connected only by the global telephone network. Even when compressed, the number of bits (and, correspondingly, bandwidth) required to represent video data significantly exceeds that which is required to represent audio data. The global telephone system was originally designed to transmit analog signals and is still in the process of being converted to handle digital data. While the amount of bandwidth available for normal use has steadily increased, conventional telephone lines still do not have the capacity to support the transmission of the large amount of data necessary for video conferencing. Because of this fact, most prior art video conferencing systems operated over high bandwidth point-to-point lines and, since these can be terminated only at a single point, implicitly required a dedicated video conferencing center at the termination point.
A few video conferencing systems have been designed that utilize data networks for information transfer. However, as described earlier, in virtually all such systems, information is routed through a server or host which controls and tailors the individual data streams. Because of the lack of control that it affords them, this architecture has proved to be unsatisfactory to most end users. An additional (and more subtle) problem arises with this design in that such systems are vulnerable to single-point failures; if the host becomes unavailable or leaves the conference, there is no provision for the conference to continue.
However, a significant problem exists in the addressing structure of most data communications networks inasmuch as each node must have a unique address. With a separate address for each conferee, most video conferencing systems must operate in a manner wherein dedicated information must be structured and addressed to a single user. The methodologies employed vary from systems in which the originator acts as the server, generating multiple video and audio streams, each uniquely addressed to an end user; to centrally arranged switching systems and special-purpose multi-party conference units (MCUs). In all these architectures, however, individual message streams are required for each end user. This significantly increases the bandwidth requirements of a given network as the number of participants per conference increases. The formula for calculating the number of data streams in this architecture is [nb(n-1)] where n is the number of participants and b is the bandwidth required per participant. Thus, an eight-party conference where each user needs 10 megabits per second (Mbps) requires a data service capable of 560 Mbps [8*10Mbps*(8-1)]. The amount of bandwidth mandated by this architecture is so high that it has effectively precluded the implementation and deployment of any video conferencing system with more than four nodes.
One application of a video conferencing system that has not been realized to any viable extent is that of transmitting voice and video information for multi-party conferencing in a real-time mode over the global communications network (or a subset thereof) utilizing the communications protocols native to that network (Transmission Control Protocol "TCP/IP," User Datagram Protocol "UDP" or IP Multicast). Even if such methods were available, the bandwidth available on most of these networks could not support real-time audio and video transmission. However, it is anticipated in the future that the bandwidth of these networks will increase to support such an application.
In summary, prior art video communications systems have proved to be unsatisfactory to most users in a number of ways:
1. prior art video communications systems did not allow conferees to control their view of the conference in any manner that approaches natural experience; PA1 2. prior art video communications systems did not allow conferees to communicate unless all conferees are present at the same time; PA1 3. prior art video communications systems did not allow the user to participate in more than one conference at a time; PA1 4. prior art video communications systems did not allow every video and audio data stream in a multi-party conference to have different characteristics; PA1 5. prior art video communications systems that utilize the global telephone network for transmission required point-to-point communications lines, which limit conferences to certain pre-defined physical locations; PA1 6. prior art video communications systems that utilize data networks for transmission had a client/server or host/slave architecture, which renders them vulnerable to failure; PA1 7. prior art video communications systems generally required a large amount of bandwidth as a function of their addressing structures; and PA1 8. prior art video communications systems did not utilize the developing global communications network.