As video conferencing and various intelligent terminals develop, a conference site has developed from having a single camera, a single active video, and a display for a single active picture to having multiple cameras, multiple active videos, and a display for multiple active pictures; and the intelligent terminal has also developed from having no camera lens to having multiple camera lens. Therefore, traditional point-to-point communication or multi-point communication is changing from a single audio-video stream to multiple audio-video streams.
To enable each participant in multi-stream communication to select information about a media stream from multiple perspectives, the Internet Engineering Task Force (IETF) introduces a controlling multiple streams for telepresence (CLUE) protocol. The protocol describes location information of media content, information about a site, and information about participants in a media capture area, and defines a set of media announcements/configuration information used to transfer media information.
In a scenario in which multi-stream communication is implemented in a multi-party session, each participant uses a CLUE message to announce media information of the participant to a central node; the central node reconstructs a new media announcement message according to the received media information announced by each participant, and sends the new media announcement message to each participant. Therefore, each participant can receive media information announced by another participant, and dominates, with reference to a capability of the participant, the received media information announced by the another participant.
In particular, as a quantity of participant terminals increases, an amount of information used to announce the media information increases accordingly. However, because each participant currently can dominate, according to a capability of the participant, media of the another participant, the central node cannot control, according to a conference policy, media dominated by each participant. That is, in a large conference, each participant can dominate media as the participant wishes, and a conference center cannot centrally control media of each participant. Consequently, control over media at a whole site is relatively disordered, which leads to weaker control over the media at the whole site by the central node.