1. Field of the Invention
The present invention relates to a mixed video delivering apparatus, a mixed video delivery method, and a program storage medium for generating a mixed video for each terminal from videos transmitted from a number of terminals participating in a videoconference, for example, and delivering the mixed video to those terminals.
2. Related Art
With the development of communication technology, inexpensive broadband network infrastructures have become widely available. Communication services such as Asymmetric Digital Subscriber Line (ADSL) and Fiber To The Home (FTTH) lines have been developed, which enables a broadband networking environment to be easily built not only in businesses but in general households.
By utilizing a broadband network, communication services that involve not only audio but video, which requires a large amount of data, can be provided. One example of provision of such services is a videoconference system. A multipoint videoconference system enables communication not only between two parties but among multiple participants by connecting a number of locations via a network.
To build a multipoint videoconference system for three or more parties, two methods are available: a method in which conference terminals exchange videos with each other and a method which employs a Multipoint Control Unit or MCU. In the latter method, the MCU receives videos from conference terminals, applies processing such as scaling or clipping to the videos, for example, and combines or mixes them into one video, which is delivered to the conference terminals. Since each conference terminal has to transmit and receive videos to and from only one MCU, this method can reduce processing load involved in video transmission/reception on each conference terminal as compared to the former method.
For mixing of videos delivered by a MCU, various types of layouts (or screen splitting) are available. For instance, such layouts include a 4- or 9-part split screen and a picture-in-picture screen utilizing overlaying, and these layouts can also be changed from a conference terminal.
At present, it is a general practice to transmit videos as compressed video data when transmitting/receiving videos over a network in order to reduce the amount of data. Each conference terminal and the MCU establish a communication session prior to transmission and reception of videos, and when they utilize Session Initiate Protocol (SIP) as a protocol for the communication session, for instance, they utilize Session Description Protocol (SDP) defined by RFC2327 to exchange an encoding method and/or encoding parameters as information about compression. When the MCU establishes an independent communication session with each of conference terminals, the MCU can also suit the capability of the respective conference terminals such that it receives videos that are encoded with encoding methods and encoding parameters that are different among the conference terminals and transmits mixed videos that are encoded with encoding methods and encoding parameters that are different among the conference terminals.
By suiting the capability and the like of each conference terminal, the MCU can receive video data that are encoded or compressed with encoding parameters that vary from one conference terminal to another and transmit mixed video data that are encoded or compressed with encoding parameters that vary from one conference terminal to another.
Since encoding parameters are independently set between the MCU and each conference terminal, a mixed video generated by the MCU contains video data that are encoded with different encoding parameters. Here, consider a three-party conference and focus attention to frame rate as an encoding parameter. For example, suppose that the MCU is configured to receive video data from person A at 30 frames/second (fps) and transmit mixed video data at 30 fps to person A. The MCU is also configured to receive video data from person B at 10 fps and from person C at 5 fps. As the frame rate to and from person A is set to 30 fps, the MCU encodes and transmits a mixed video at 30 fps to person A. But when a mixed video being transmitted to person A only contains videos of persons B and C, for example, the video would be transmitted at a needlessly high frame rate if transmitted at 30 fps. Granted that transmission and reception frame rates between the MCU and person A are set to be asymmetric such that the MCU receives video data from person A at 30 fps and transmits mixed video data to person A at 25 fps, a video of a needlessly high frame rate will be transmitted in this case as well.
Likewise, consider a three-party conference and focus attention to bit rate as an encoding parameter. For example, suppose the MCU is configured to receive video data from person A at 1.5 Mbits/second (bps) and transmit mixed video data to person A at 1.5 Mbps. Likewise, the MCU is configured to receive video data from person B at 128 kbps and from person C at 768 kbps. Since the bit rate to and from person A is set to 1.5 Mbps, the MCU encodes and transmits a mixed video at 1.5 Mbps to person A. But when a mixed video being transmitted to person A contains only videos of persons B and C, for instance, the video would be transmitted at a needlessly high frame rate if transmitted at 1.5 Mbps.