A videoconferencing service is a multimedia service that combines information such as speech, images, and data for long-distance transmission. At present, a widely used videoconferencing service is a video conference (VC), where the VC refers to a communications manner in which audio and video communications technologies and devices are used to hold a conference between two locations or among multiple locations by using transmission channels. As shown in FIG. 1, at present, a video conference system generally includes at least two video conference terminal subsystems, a transmission channel, and a multipoint control unit (MCU). As shown in FIG. 2, a video conference terminal subsystem located at each site includes a video conference terminal (VCT), a video input device (for example, a camera or a camera array) for collecting a video signal or an image signal, an audio input device (for example, a microphone or a microphone array), an audio output device (for example, a loudspeaker or a loudspeaker array) for playing a received audio signal, and a display device (for example, a display or a projector) for displaying a received video (or image) signal.
At present, a VCT located at a site is used to send a request message generated at this site to an MCU through a transmission channel corresponding to the VCT, and receive a control instruction from the MCU through the transmission channel corresponding to the VCT. The VCT is further used to compress and encode a main stream signal and a presentation stream signal of this site that are collected by a camera or a microphone, and multiplex the compressed and encoded signals and send the multiplexed signals to the MCU through the corresponding transmission channel. A main stream signal is a signal of a site corresponding to a conference location that is collected in real time, for example, an image signal, a speech signal, or a video signal that is collected in real time. A presentation stream signal is another signal except the main stream signal of the site corresponding to the conference location, for example, a shared video signal or a data signal corresponding to a shared document. The MCU performs processing such as mixing and adaptation on the received signals, and sends the processed signals to another VCT through a transmission channel corresponding to the other VCT, where the other VCT is a VCT except the VCT among VCTs managed by the MCU. In addition, the VCT is further configured to classify, decompress, and decode the signals that are received from the MCU through the transmission channel corresponding to the VCT, to obtain the decoded main stream signal and the decoded presentation stream signal, and control the decoded main stream signal and the decoded presentation stream signal to be displayed on a display device of the site in a picture in picture (PIP) manner, that is, to display the main stream signal in full screen, and display the presentation stream signal in an inset window, or display the presentation stream signal in full screen, and display the main stream signal in an inset window.
At present, a size of a signal displayed in an inset window is relatively small, and a signal that is displayed in an inset window blocks a part of a signal that is displayed in full screen, which leads to relatively undesirable effects of displaying the main stream signal and the presentation stream signal.
In conclusion, at present, a VCT controls a restored main stream signal and a restored presentation stream signal to be displayed on a display device of a site corresponding to a conference location in a PIP manner, which leads to relatively undesirable effects of displaying the main stream signal and the presentation stream signal.