A number of technological systems can be used to have a meeting among participants not located in the same area.
The most realistic substitute of real meetings is high-end conventional video conferencing systems. Conventional video conferencing systems comprise a number of end-points communicating real-time video, audio and/or data streams over WAN, LAN and/or circuit switched networks. The end-points include one or more monitor(s), camera(s), microphone(s) and/or data capture device(s), and a codec, which encodes and decodes outgoing and incoming streams, respectively. In addition, a centralized source, known as a Multipoint Control Unit (MCU), is needed to link the multiple end-points together. The MCU links the multiple end-points together by receiving the multimedia signals (audio, video and/or data) from end-point terminals over point-to-point connections, processing the received signals, and retransmitting the processed signals to selected end-point terminals in the conference.
Considerable effort has been directed to minimizing the extra capabilities that must be provided to videoconferencing terminal equipment. Conventionally, this is appreciated by considering Recommendation H.323 of the International Telecommunication Union's Telecommunication Standardization Sector, which is incorporated herein by reference in its entirety. H.323 is an umbrella standard for video conferencing on packet switched networks, including IP networks.
Most systems for implementing the H.323 approach also include a gatekeeper. A gatekeeper manages the videoconferencing activities of various endpoints and other equipment within a “zone” of such equipment that has registered with the gatekeeper in accordance with procedures that H.323 sets forth. The gatekeeper's responsibilities vary from implementation to implementation, but they typically include granting videoconference access to the network on the basis of whatever policies the administrator has imposed, allocating network bandwidth among videoconferences, and providing address translation.
Giving the gatekeeper the address-translation task is one way in which system designers minimize endpoint-capability requirements: they thereby relieve the endpoints of the need to keep track of various potential participants' network addresses. To designate a called party, for example, a user may enter an easily remembered alias such as “doe.john@marketing.com.” For actual signal transmission to the other party, though, that alias must be translated into a network address such as “130.239.67.2.” Rather than maintaining a translation table, which would ordinarily require frequent updating, the endpoint simply sends a message to the gatekeeper asking for address resolution.
In a typical IP based videoconference, endpoints connect by first setting up the call through a Gatekeeper, which resolves the dialled address, and then media is sent either directly between the called parties or media can be routed through the Gatekeeper or PBX. This is the case for both point-to-point and multipoint calls.
In some cases it may be a requirement to record the content of the video conference/call. Both the audio and the video are normally required for recording. There are several occasions where streaming or archiving is needed. A stored call could be used as a substitute or a supplement for minutes of a meeting, an evidence of an oral agreement, or evidence submitted to a court. Archiving calls is particularly important for certain financial institutions, which require that all telephone calls must be recorded and archived for regulatory compliance. Archiving video calls for regulatory compliance may also become a requirement.
Further, streaming a conference is useful in cases where only a limited number of participants are active, while the remaining participants are spectators.
One way of archiving video calls is simply to record the analog video and audio outputs from a codec. Another commonly used method is to record a call by connecting a specialized H.323 or SIP endpoint into the call, and turning the call into a multipoint call. The specialized H.323 or SIP endpoint will then record the entire session on behalf of the other parties connected.
One problem with prior art is that they are typically able to record only a mix of the audio, but not video from both parties separately. To exemplify, two scenarios of a point-to-point call according to prior art are described in the following.
Video from a first party is displayed and therefore recorded in full screen, while video from the second party appears, and is recorded, as a picture-in-picture (i.e. a small window) within the larger window. This is not an optimal recording, since the resolution of the picture-in-picture window is less than the actual resolution. The lower resolution of the picture-in-picture makes it difficult to view details, and the picture-in-picture will overlay part of the large image and hide information.
Video is recorded in a voice-switched mode. This means, that video from the current talker is recorded. When the other party starts talking, video is recorded from that party. The problem here is that video is recorded in a “half duplex” mode, i.e. you cannot see both parties at the same time. This means that non-auditory information may be lost. Also, consider the case where you have deaf people using sign language—voice-switching clearly does not work for this mode of operation.