1. Field of the Invention
The present invention relates to a scene video switch system based on dynamic detection of an area of concern and a scene video switch method based on the dynamic detection of the area of concern.
2. Description of the Related Art
With the developments of video compression techniques and video enhancement techniques, remote video systems have been widely used (particularly in the business field) in recent years. A typical representative of the remote video systems is, for example, a remote video conference system or a remote medical care system in which a video capture unit and a video display unit are placed in at least two terminals, respectively, and the terminals are made able to communicate with each other by employing a wired or wireless communication unit so that users of the terminals may acquire real-time or off-line videos from each other. In an application of the remote video system, by dynamically detecting an area of concern and emphasizing the scene of the area of concern, it is possible to dramatically improve the user-friendly interaction of the remote video system.
Up to now, the following techniques have been proposed with regard to changing a displayed scene by dynamically detecting an area of concern.
According to the technique proposed in the below cited reference No. 1, an area of concern is dynamically detected and optimally displayed, whereas display of an area of unconcern is omitted. In this technique, based on different contents of an area of concern, a size ratio of a display area may be automatically adjusted. However, in this reference, an area of concern is just limited to a human face; that is, the size of an image is proportional to the size of the human face, and scene states in a video conference are not classified. On the other hand, if the size of an area of concern is relatively small, the video quality may be negatively influenced when only carrying out equal proportional enlargement.
The below cited reference No. 2 provides a technique of utilizing a full-angle camera head to capture a conference and being able to provide real-time and off-line video display for users. This technique includes an automatic camera head management system for controlling the camera head and an analysis module for positioning those who are present (i.e. attendees). However, in this reference, it is necessary to use a full-angle camera head or an array formed of plural general camera heads to provide a video of each of the attendees; as a result, this is a very heavy burden on the aspect of hardware apparatus. Furthermore the users may only carry out switching between videos of a single attendee so that important information of areas of concern of other attendees may be lost.
The below cited reference No. 3 utilizes a video detection technique to carry out detection of attendees in a video captured by a camera head, and then based on the detected positions and size information of the attendees, automatically adjusts the orientation and zoom proportion of the camera head so that a best video including all of the attendees may be provided. However, this reference may only provide a video including all of the attendees; in other words, there is a limitation on the aspect of a video of one single attendee. Furthermore, in this reference, departure of some of the attendees may be effectively detected. However, as for attendance of new attendees, this technique carries out audio information positioning outside the detection area; as a result, there is a certain limitation too.
The below cited reference No. 4 provides a method of tracking plural attendees in a video conference. This method includes a step of monitoring the video conference; a step of creating video positioning information; a step of creating audio positioning information; and a step of adjusting parameters of a camera head based on the video positioning information and the audio positioning information. In this reference, only switching between a scene video of a speaker and a scene video of all the attendees may be carried out, and only by carrying out detection and positioning with regard to the video and audio of the speaker, the video may be switched to the speaker. In addition, since this method cannot carry out dynamic detection and switching between areas of concern, and cannot provide extension of a scene, in a case where new attendees enter or present attendees leave the conference, this method cannot carry out automatic adjustment on the video.
Since an area of concern of users is changeable during a video conference, all of the techniques described in the above relevant references cannot provide a best video for the users during the whole conference. For example, the reference No. 4 may only provide two selectable scene videos, but cannot carry out dynamic detection of an area of concern and the corresponding scene state switching. The reference No. 2 may provide videos of difference scenes, but this calls for dramatically increased hardware at the same time; also dynamic detection of an area of concern and the corresponding scene state switching cannot be carried out. In addition, although the reference No. 1 mentions video display based on an area of concern, the defined area of concern is limited, dynamic detection of the area of concern and the corresponding scene state switching cannot be carried out, and the visual quality of the scene video is not ideal in some cases.
Cited Reference No. 1: US Patent Application Publication No. 2010/0103245 A1
Cited Reference No. 2: U.S. Pat. No. 7,580,054 B2
Cited Reference No. 3: US Patent Application Publication NO. 2009/0015658 A1
Cited Reference No. 4: U.S. Pat. No. 6,611,281 B2