Today's computing and network architectures readily support the transfer of text and still graphics or images. However, support for real-time media processing and networking has, until very recently, been realised entirely with overlay networks and service specific terminal equipment for displaying such media to the user. Voice telephony for example, is the most pervasive media specific network. Broadcast video and cable TV also use a dedicated transmission and switching infrastructure. In the same vein, high quality video conferencing requires leased lines and expensive dedicated equipment.
The most commercially successful segment of the video conferencing market has been for so-called "P.times.64" systems based on the ITU's H.320 series standards. Such systems aggregate from 2 to 30 DSO (64 kbps) channels over switched or leased-line Time Division Multiplexing (TDM) networks into a wideband channel (128 kbps-2 Mbps) to transport audio, video, data and control in a point-to-point manner. Multipoint conferencing is achieved through a centralised Multicast Control Unit (MCU) as shown in FIG. 1 which typically mixes audio and multicasts the single current speaker to all sites. An alternative, a distributed switching and mixing arrangement will be described with reference to FIG. 2.
More rapid growth of this market has been hampered by both high equipment costs and high service costs. In addition to the cost of video COder/DECoder (CODEC) hardware, equipment costs are exacerbated in current networks by the need for Inverse Multiplexers (I-MUXs) to aggregate switched DSO circuits due to the absence of wideband channel switching, and MCUs due to the absence of multicast switching. Service costs have been kept high due to bandwidth based tariffs needed to protect revenues from voice telephony and so high bandwidth, high quality video conferencing is still a luxury rarely afforded.
Further limitations exist at the users terminal. Real-time media imposes high processing requirements, particularly if the media stream needs to be decompressed for display. Usually the resolution and size of display monitors is restricted.
FIG. 1 shows in schematic form a known video conferencing arrangement. Using multicast control unit (MCU) for multiplexing or mixing and distributing all video streams transmitted by the network 2 to and from users 3 gives a centralised topology. This is suitable for use with a point to point network such as the telephone network. The expensive dedicated video mixing or selecting equipment need be provided in only one place while making use of the switching capability already provided in the network 2. One of the users 3, a chairman, has facilities to control the MCU, 1.
In operation, each user sends its own video and audio to the MCU. The chairman controls the MCU to select one of the incoming video streams, or add the video streams together in separate windows of the single output video streams. The input audio streams would all be mixed together, or the audio streams with most activity could be selected for mixing and outputting. The MCU duplicates its output video and audio streams and sends them to each of the users. Such arrangements may be limited in bandwidth or number of users by the capabilities of the MCU, or by the bandwidth of the telephone network connections.
In another known conferencing arrangement illustrated in FIG. 2, a LAN network 4 with multicast capability connects users 3. This obviates the need for a dedicated MCU. Users 3 can control which other users they see.
FIG. 3 illustrates the information which may pass between a user 10 and a network 11 such as a packet network, connecting other users 3 for video conferencing. Awareness information of which of a users are connected to the network is passed from the network to the user 10. In response to this information, a user can choose manually which other users he wants to see. Video selection request information is then passed from the user to the network. The network has the capability to take the request and switch appropriate video streams from other users 3 to the user 10.
Network restrictions have often precluded sending streams from all users in a video conference to all other users. Accordingly, the centralised switching approach shown in FIG. 1 involves either selecting one of the streams from users 1, 2, 3, for display, or creating a single image stream comprising a composite display of two or more reduced size images or windows.
U.S. Pat. No. 4,531,024 (Colton) describes a way of resolving how to select a single video stream to be transmitted to all other conference locations. The selection is made automatically by centralised detection of either one and only one "talker" or one and only one video graphics transmission request. Manual override is possible at each location, to select manually the video to be viewed.
U.S. Pat. No. 5,003,532 (Ashida) shows a video conference system having a centralised image selector operating according to requests from users or according to speaker detection.
U.S. Pat. No. 5,382,972 (Karres) describes a conferencing system which creates a composite signal with voice sensitive switching means for moving the component streams to different regions on the screen, and different sizes of picture, according to who is talking. A master user has an override control.
A further development is shown in U.S. Pat. No. 5,473,367, in which any conferee can assume the chairing role, and manipulate manually the picture which will be viewed by all. Additionally, each conferee can choose their own picture content, or take the chair view.
U.S. Pat. No. 5,615,338 shows a system in which a central controller controls the transmission from each user terminal directly, and selects two users to transmit to all other users according to user requests and a predetermined priority scheme.
U.S. Pat. No. 5,392,223 shows a communications processor for linking a group of workstations, to a network for video conferencing. A workstation initiates a request for service including type of service and destination. Bandwidth, resolution and transmission rate are adjustable. Artificial intelligence software is used in the processor, which reacts to the instantaneous loadings, and indicates to the user what is possible if the request for service can't be fulfilled.
Another example of a decentralised videoconferencing network is shown in U.S. Pat. No. 5,374,952, using a broadband LAN. Television signals from each user are transmitted simultaneously at different frequencies. Each user's computer monitors the status of channel allocations and generates the channel selecting control signals. Such dedicated LANs have inherent broadcasting capability which implies the ability to multicast, i.e. send to a select group of users.
None of the above systems is scaleable to handle large conferences because of human cognitive limitations in viewing a screen with too many windows displayed simultaneously, or with manually selecting between too many available windows. Additionally, the users terminal may have limited processing power and limited display area, and the network resources may limit how many streams can be sent to him anyway.