With the development of video conferencing technologies, a user site has developed from one camera, one moving video and one moving image to multiple cameras, multiple moving videos and multiple moving images. In the same site, the multiple cameras, the multiple moving videos, and the multiple moving images can be associated with each other in a physical or a logical relationship.
For example, in conference TV networking shown in FIG. 1, multiple cameras, multiple displays (screens) and multiple site terminals (or one site terminal with multiple code streams) can be deployed in a video conferencing site (for example, sites 1 and 2), and the multiple site terminals are interconnected through a multipoint control unit (MCU) to establish a conference. The technology that supports sending images of the multiple cameras to a remote site at the same time and displaying remote multiple video images at the same time has been widely applied in the scenarios such as tele-education and telepresence.
In the existing conferences with multi-screen sites such as tele-education and telepresence, each subordinate site may ask for the floor. In the existing video conferencing TV system, when requesting floor through signaling and a processing policy, a certain site requests floor. A chairman of the conference grants the floor request of the site, and then gives floor to the site and broadcasts the floor to the whole site. In the networking shown in FIG. 1, if site 2 requests floor, a site terminal of site 2 sends a floor request to an MCU; the MCU forwards the floor request to a chair site (for example, site 1 is the chair site); if the chair site grants the floor request of site 2, a site terminal of the chair site sends a request for giving floor to the MCU and broadcasting site 2 to the MCU; the MCU processes the floor giving and broadcasting request from the chair site, and broadcasts multiple images of site 2 to other sites; the other sites view the images of site 2, and meanwhile, the MCU sends voices picked up by all microphones (MICs) at site 2 to the other sites, so that the other sites can hear the voice of whole site 2.
It is found in practice that, when a participant at a certain code stream or on a certain camera in a certain multi-screen site asks for the floor, because not all the people in the whole site ask for the floor, people in other sites may only need to view the speaker on the camera. If the mechanism in the conventional art that the whole site is given floor and broadcast is adopted, all the video code streams (definitely including some unwanted video code streams) of the whole site are broadcast to the other sites, resulting in an unnecessary impact on the network bandwidth and a large waste of bandwidth resources.