A video conference system collects video data of a local conference site through multiple cameras and transmits the video data of the local conference site to a remote conference site through a video conference communication network, and receives video data sent by the remote conference site to perform multi-screen output on a display of the local conference site, so as to provide an effect of video communication. The video conference system has multiple video streams for transceiving processing, and each video stream corresponds to one conference terminal.
In a video conference, multiple different kinds of conference site may exist, and the conference site may have one or more display screens to display videos of multiple remote conference sites. In a case with many conference sites, in a main conference site, a conference host needs to perform manipulation such as view selection on the conference sites to view video states of all participants and perform proper manipulation. A manipulation control interface may be a web interface or a touch screen interface. List information of each conference site is displayed on the control interface in a word form or a snapshot form for the conference host to browse and manipulate each conference site.
In the video conference system in the prior art, a multipoint control unit (hereinafter referred to as MCU) transfers a composite video signal to a conference terminal, where the composite video signal includes a combined video bit stream formed of video bit streams from at least one conference terminal, and the composite video signal further includes relevant information of the control interface, for example, information about a conference, such as text data, graphic data, text data in a graph and so on. On a conference terminal side, a user inputs a control request through the control interface, and sends the request information to the MCU. The MCU responds to the request by changing a parameter of the video communication, generates or updates the control interface and control textual and graphical data information relevant to the interface, combines the control interface information and a conference terminal video together, and sends them to the conference terminal.
In the video conference system in the conventional art, the control interface is pushed to the conference terminal by the MCU, the conference terminal performs a conference control operation on relevant conference sites by using options on the interface, and the MCU refreshes the control interface according to the conference control operation so as to implement conference control. A conference site video needs to be controlled by the user through the control interface, and therefore, the user needs to divert attention to operate the control interface, which affects viewing of a conference video. In addition, the control interface is generated by the MCU, and needs to be constantly refreshed according to the operation of the user. The MCU needs to continuously send a data bit stream of the control interface to the conference terminal, and relevant data amount of the control interface is usually large, which affects a response speed of the video conference system, reduces a response speed of the user to perform the conference control operation, and affects using experience of the user.