A video session system adopts a conventional protocol stack, the transmission of a path of main video stream between terminals is supported, a transmission requirement for multiple paths of main video streams is not supported, and the requirement of capture features of each stream is not described. For example, a spatial area at which a camera captures video streams is not described. When designing a telepresence system, the video session system is improved, the transmission of multiple paths of media streams is supported, and the capture features of each stream are described. The problem how to transmit the multiple paths of media streams by means of a telepresence session system also exists currently. If a traditional transmission mode of transmitting one path of media stream via one port is still adopted, there will be a obstacle when performing Network Address Translation (NAT) or crossing firewall due to the adoption of too many ports. The best way refers to that multiplexing transmission is performed on the multiple paths of media streams on the same transmission address, and the compatibility of new devices and old devices is supported on the basis of supporting multiplexing transmission. However, there is no corresponding solution how to transmit the multiple paths of media streams by the telepresence system in the related art.
Thus, the problem that there are too many ports and multiplexing transmission is unsmooth due to the fact that a multiplexing transmission mode of a traditional session system is still adopted in the telepresence system based on a conventional protocol architecture exists in the related art.