As a way for performing remote communications between two or more parties, the video conference is used more and more widely due to its advantages such as saving the manpower and material cost and improving the efficiency. Generally, during a video conference, the video and audio sources collected on the local side are encoded and then the encoded data is transmitted to an opposite side for performing decoding, so thon the opposite side can observe the conditions of the local side in real time. In order to enhance the user's sense of immediacy, a plurality of terminals are arranged at certain positions in the same conference room, and the multi-channel data collected at different angles in the conference room is transmitted to the opposite side, so that the opposite side can observe the conditions of the local side from multiple angles. Thus, the positional relationships between respective terminals on the local side need to be correctly recovered after decoding by the opposite side.
In the prior art, for the purpose of correctly recovering the positional relationships between respective terminals on the local side, generally one terminal in the conference room is taken as a call terminal for call establishment with other conference room, while other terminals in the conference room only exchange media streams with the terminal. Assuming that conference room Ta includes terminals Ta1, Ta2 and Ta3, and conference room Tb includes terminals Tb1, Tb2 and Tb3, while terminal Ta2 in conference room Ta and terminal Tb2 in conference room Tb serve as call terminals of the two conference rooms, respectively. Then when a video conference is carried out between conference rooms Ta and Tb, terminals Ta1 and Ta3 need to transmit the media stream to terminal Ta2, terminals Tb1 and Tb3 need to transmit the media stream to terminal Tb2, and a call is established between the two conference rooms via Ta2-Tb2.
The inventors find that the prior art at least has the following problems: only via a call terminal (i.e., Ta2 or Tb2) can a terminal (Ta1) in one conference room transmit the media stream to corresponding terminal (Tb1) in other conference room, which leads to a large path delay. In addition, the structure between the call terminal (Ta2) and the terminal (Ta1 or Ta3) is a server/client, which requires a strong processing capability of the call terminal. Moreover, the connections between respective terminals need to be established in a private manner, which is adverse to implement extensions, and the scalability will be worse in case the terminals in the conference room are increased.