In recent years, conference communication services that implements multipoint conferences in which three or more parties can join have been in practical use. Audio conferences and multipoint video conferences are examples of the multipoint conferences.
A conference function compliant to the Session Initiation Protocol (SIP) may be used to provide this service. The conference call system for organizing a conference call using the SIP conference function includes multiple terminals, a conference server for controlling the connection among the terminals, and a mixer for synthesizing audio media data. The mixer synthesizes the audio media data transmitted from the terminals, and transmits the synthesized data to each of the terminals.
The following describes a conventional conference call system.
FIG. 32 illustrates the structure of the conventional conference call system using the SIP conference function. The conference call system 700 illustrated in FIG. 32 includes terminals 701 to 703, a server 704, a mixer 705, and a network 706.
The terminals 701 to 703 are IP telephone terminals, and the server 704 is a conference server. The network 706 is, for example, an internal network.
The terminals 701 to 703, the server 704, and the mixer 705 are connected through the network 706.
The terminals 701 to 703 transmit the audio media data to the mixer 705 through the internal network 706. The mixer 705 synthesizes the audio media data transmitted from the terminals 701 to 703. The mixer 705 transmits the synthesized audio media data to the terminals 701 to 703.
More specifically, the mixer 705 synthesizes the audio media data transmitted from the terminals 702 and 703, and transmits the synthesized audio media data to the terminal 701. Similarly, the mixer 705 synthesizes the audio media data transmitted from the terminals 701 and 703, and transmits the synthesized audio media data to the terminal 702. The mixer 705 further synthesizes the audio media data transmitted from the terminals 701 and 702, and transmits the synthesized audio media data to the terminal 703. As such, the mixer 705 in the conference call system 700 implements a three-party call.
The following describes the operations of the conventional conference call system 700.
FIG. 33 is a sequence diagram illustrating a process flow in the conventional conference call system 700. FIG. 33 also illustrates the process flow for a conference call among the terminals 701 to 703 organized by the terminal 701. In FIG. 33, ACK, REFER response, NOTIFY, NOTIFY response, MESSAGE response, and others are omitted.
First, the terminal 701, the conference organizer, transmits an INVITE message (hereafter it is also simply referred to as “INVITE”. The other messages from the terminal 701 such as REFER, NOTIFY, and others are abbreviated in the same manner) describing the media information of the terminal 701 in the Session Description Protocol (SDP) to the server 704 (S701). More specifically, the media information includes the IP address, the receiving port number, and the available codec of the terminal 701.
Next, the server 704 returns a 200 response including the media information of the mixer 705 which is held in advance to the terminal 702 (S702).
Furthermore, the server 704 notifies the mixer 705 of the IP address and the receiving port number of the terminal 701, the IP address and the receiving port number of the mixer 705, and the codec to be used (S703). For example, the server 704 notifies the mixer 705 of the information of the terminal 701 and others, using MESSAGE in the SIP. Note that, the server 704 may notify the mixer 705 of the information of the terminal 701 using the HTTP and others.
The terminal 701 transmits ACK in response to the 200 response in step S702, and subsequently starts transmitting the media data to the mixer 705. The mixer 705 starts transmitting the media data to the terminal 701 (S704).
The terminal 701 then transmits REFER including SIP URI of the terminal 702 to invite the terminal 702, a terminal which is to be participating the conference (S705). The server 704 transmits a 202 response in response to REFER to the terminal 701. Furthermore, the server 704 transmits NOTIFY to the terminal 701 to notify the invited status. The terminal 701 that received NOTIFY transmits a 200 response to the server 704.
Next, the server 704 transmits INVITE including the media information of the mixer 705 to the terminal 702 (S706).
The terminal 702 that received the INVITE transmits a 200 response including the media information of the terminal 702 (S707).
The server 704 transmits ACK in response to the 200 response to the terminal 702, and subsequently notifies the mixer 705 of the necessary information using MESSAGE, in the same manner as the process for the terminal 701 (S708). The server 704 further transmits, to the terminal 701, NOTIFY for notifying that the invitation is completed. The terminal 701 transmits a 200 response in response to NOTIFY to the server 704.
The terminal 702 starts transmitting the media data to the mixer 705. The mixer 705 also starts transmitting the media data to the terminal 702 (S709).
Next, the process same as the process from step S705 to S709 is performed, and the terminal 703 and the mixer 705 start transmitting/receiving the media data to/from each other (S710 to S714).
As such, transmitting the audio media data from the terminals 701 to 703 to the mixer 705, synthesizing the audio media data by the mixer 705, and transmitting the synthesized audio media data to the terminals 701 to 703 by the mixer 705 enable an audio conference among the terminals 701 to 703.
In addition to the conference calls, the SIP conference function also implements multipoint video conferences. The following describes a conventional multipoint video conference system.
FIG. 34 illustrates a structure of the multipoint video conference system using the conventional SIP conference function. The video conference system 800 illustrated in FIG. 34 includes terminals 801 to 803, a server 804, a mixer 805, and a network 806.
The terminals 801 to 803, the server 804, and the mixer 805 are connected through the network 806.
The terminal 801 includes a camera 841, and monitors 821 and 831. The terminal 802 includes a camera 842, and monitors 822 and 832. The terminal 803 includes a camera 843, and monitors 823 and 833.
The terminals 801 to 803 transmit the video and audio media data captured by the cameras 841 to 843, respectively, to the mixer 805 via the network 806. The mixer 805 synthesizes the video and audio media data transmitted from the terminals 801 to 803. The mixer 805 also transmits the synthesized video and audio media data to the terminals 801 to 803.
More specifically, the mixer 805 transmits the media data transmitted from the terminal 802 and the media data transmitted from the terminal 803 to the terminal 801. The terminal 801 displays the received media data on the monitors 821 and 831. This allows a user 811 using the terminal 801 can talk with the user 812 using the terminal 802 and the user 813 using the terminal 803.
In the same manner, the mixer 805 transmits the media data transmitted from the terminal 801 and the media data transmitted from the terminal 803 to the terminal 802. Furthermore, the mixer 805 transmits the media data transmitted from the terminal 801 and the media data transmitted from the terminal 802 to the terminal 803. The terminal 802 displays the received media data on the monitors 822 and 832. Furthermore, the terminal 803 displays the received media data on the monitors 823 and 833.
The structure described above enables the multipoint video conference system 800 to hold a multipoint video conference.
The following describes the operations of the conventional video conference system 800.
FIG. 35 is a sequence diagram illustrating the process flow in the conventional video conference system 800. Note that FIG. 35 illustrates the process flow for a video conference among the terminals 801 to 803 organized by the terminal 801. In FIG. 35, ACK, REFER response, NOTIFY, NOTIFY response, MESSAGE response, and others are omitted.
The process in steps S801 to S814 illustrated in FIG. 35 corresponds to the process in step S701 to S714 illustrated in FIG. 33, respectively. Here, only the difference from the process illustrated in FIG. 33 shall be described.
First, the terminal 801, the conference organizer, transmits INVITE including the media information of the terminal 801 to the server 804 (S801). Here, the terminal 801 can transmit and receive two types of audio and two types of video. Thus, the INVITE includes the two receiving port numbers and available codecs corresponding to the video and audio, respectively.
Next, the server 804 returns a 200 response including the media information of the mixer 805 to the terminal 802 (S802). The 200 response includes the two receiving port numbers and available codecs corresponding to video and audio, respectively.
The server 804 further notifies the mixer 805 of the IP address and the receiving port number of the terminal 801, the IP address and the receiving port number of the mixer 705, and the codec to be used (S803). Here, the mixer 805 is notified of the receiving port numbers and available codecs corresponding to audio and video.
The messages in the following processes also include the media information with regard to the audio and video.
As such, the terminals 801 to 803 transmit the audio and video media data to the mixer 805. Furthermore, the mixer 805 synthesizes the audio and video media data and transmits the synthesized data to the terminals 801 to 803. This implements a video conference among the terminals 801 to 803
However, the conventional video conference system 800 illustrated in FIG. 34 requires the mixer 805 which synthesizes the audio and video media data. This causes a problem for the video conference system 800 that the cost for constructing the system increases. Furthermore, the media data is transmitted and received through the mixer 805. This increases the delay in the media data as much as the processing time for the mixer 805, which is another problem to be solved.
In response to these problems, there has been a video conference system that does not require the mixer 805. An example of the video conference system that does not require the mixer 805 is a video conference system using the 3rd Party Call Control (3PCC), proposed in RFC3725 by the Internet Engineering Task Force (IETF).
Furthermore, a video conference system that can hold video conferences only with terminals, without the server 804 or the mixer 805 for further cost reduction has been proposed in Patent Literature 1, for example.
The following describes a conventional video conference system using the 3PCC.
FIG. 36 illustrates the structure of the conventional video conference system using the 3PCC. The video conference system 900 illustrated in FIG. 36 includes terminals 801 to 803, a server 804, and a network 806. Note that, the reference numerals same as FIG. 34 are assigned to the components similar to the components in FIG. 34. Thus, descriptions for these components are omitted.
The video conference system 900 illustrated in FIG. 36 differs from the video conference system 800 illustrated in FIG. 34 in that the mixer 805 is not included.
FIG. 37 is a sequence diagram illustrating the process flow in the conventional video conference system 900. FIG. 37 also illustrates the process flow for a video conference among the terminals 801 to 803, triggered by the origination from the server 804.
First, the server 804 transmits INVITE that does not include the SDP to the terminal 801 (S901). The terminal 801 that received INVITE transmits a 200 response including the media information of the terminal 801 to the server 804 (S902).
The server 804 transmits INVITE including the media information of the terminal 801 included in the received 200 response to the terminal 802 (S903). Next, the terminal 802 transmits a 200 response including the media information of the terminal 802 to the server 804 (S904). The server 804 that received the 200 response returns ACK to the terminal 802 (S905).
Furthermore, the server 804 transmits ACK including the media information of the terminal 802 included in the received 200 response to the terminal 801 (S906).
As such, the terminals 801 and 802 become ready for directly transmitting/receiving the media data to/from the terminals (S907).
In addition, the terminals 802 and 803 become ready for directly transmitting/receiving the media data between the terminals, with the process similar to the process in step S901 to S907 (S908 to S914). Furthermore, the terminals 801 and 803 can directly transmit and receive the media data between the terminals with the process similar to the process in step S901 to S907 (not illustrated).
As described above, the video conference system 900 can establish a connection among the terminals 801 to 803 that allows direct transmission/reception of the media data to/from the terminals 801 to 803.    [Patent Literature 1] Japanese Unexamined Patent Application Publication 2005-333446