Nowadays, an acoustic sound source location technology has been widely adopted in video phone, telephone conference, and video conference systems to control a video camera to focus on the person who is speaking.
For example, in a conference with a plurality of individual participants, a video camera based on an acoustic sound source location technology can automatically focus on a person who is speaking, for example, a representative, a reporter, etc., based on the position of an acoustic sound source, but if another person makes a sound at the same time, the video camera may turn to the person who makes the sound, which results in an undesirable effect.
As another example, in a conference with a plurality of individual participants who can speak and discuss, generally, the video camera is expected to preferably focus on the expert who is participating in the discussion. However, the video camera based on the acoustic sound source location technology generally focuses on the participant who makes a sound with highest intensity, and if the voice of the expert is not higher than that of the others during the discussion, the video camera cannot focus on the expert.
In view of the above, the existing video camera based on acoustic sound source location cannot satisfy the requirements under various scenarios, and if various application scenarios need to be satisfied, the video camera is required to be remotely controlled manually, which is rather inconvenient in both operation and maintenance.