Technical Field
The present disclosure relates to a method, apparatus, and system for processing video.
Description of the Related Art
Remote communication systems such as video conference systems are now in widespread use, allowing users of different terminals at different locations to communicate by simultaneous two-way video and audio transmissions. Some video conference systems support an audio source detection application or a facial recognition application for identifying an active speaker who is currently speaking from among a plurality of participants, and displaying enlarged video of the active speaker to attract the other participants' attention.
However, with typical systems for displaying the enlarged video of the active speaker, the video displayed on a screen is sometimes switched too quickly in response to frequent changes in speaker. Such quick change in display is not suitable for viewing. To address this issue, it has been proposed to limit the switching of displaying the enlarged video of the active speaker. However, this leads to delay of display change in response to quick changes in the current speaker.
Further, in addition to typical systems for displaying the enlarged video of the active speaker, systems for determining a main speaker in the conference other than the current active speaker has been wanted. If a target to be displayed enlarged is limited to the current speaker, the video displayed may be switched too frequently every time the active speaker changes. Such frequent changes in display are not suitable for viewing.