With the development of the video conference field, a user conference site evolves from one camera, one active video, and one active image display to multiple cameras, multiple active videos, and multiple active image displays. The multiple cameras, multiple active videos, and multiple active image displayers in a same site are associated by means of a physical or logical relationship. A site A is a three-screen site, a site B is a dual-screen site, and a site C is a single-screen site. A camera 1 of the site A can capture an image of an attendee at a position 1 in the site A, and the image is displayed on a screen 1 of the site A, site B, or site C.
In a conventional telepresence technology, a multi-screen and multi-display demand scenario is introduced, in which it is allowed that corresponding image content is displayed according to a rule (for example, an activity level) in a conference. An image associated with a position is defined as a capture scene, and different image display manners of a same site are defined as different capture scene entries (CSE). In the prior art, switching between images, namely, different capture scene instances, can only be provided based on a same site, for example, switching to an image with a high activity level in a same site.
A problem in the prior art is that switching between images can be provided only based on a same capture scene.