In a conventional media session (such as an audio and video conference that is based on a central node) system, a conferencing server located on a central node receives audio and video streams from participating terminals (that is, media terminals). The conferencing server selects a corresponding audio and video stream with reference to a local policy and a receiving capability of another participating terminal, and forwards the audio and video stream to the another participating terminal. In this way, normal audio and video communication is established between the participating terminals.
Sometimes, particular participating terminals may prefer an audio and video media stream of a specific version (for example, a quality version or a format version). Therefore, some participating terminals may encode a media source (such as a video source) into media streams of multiple versions (the media streams of multiple versions may include, for example, a media stream of a standard-definition version, a media stream of a high-definition version, and a media stream of a super-definition version), and the media streams of multiple versions are simulcast in a session. A participating terminal that receives the media streams may select a media stream of a required version from the simulcast media streams of multiple versions for play.
It is found in a research and practice process, that a transmit end of a media stream in the prior art usually simulcasts, in an immoderate manner, media streams of multiple versions of a media source. Consequently, in a case such as transmission path congestion, quality of an entire media session may be severely affected, and further, product experience of a user may be greatly affected.