1. Field of the Invention
The present invention generally relates to a method of tracking a target, and more particularly, to a method of tracking a vocal target combining voice detection and image recognition.
2. Description of Related Art
Along with the internet popularization, various technologies used in the internet are accordingly updated day by day, wherein in addition to the basic functions, such as browsing web pages or receiving and sending emails, some novel functions of transmitting multimedia files through the internet are further developed, for example, transmitting pictures, music or video frames. Furthermore, thanks to an increasing internet bandwidth, the internet connection speed has increased by two folds as well, so that the expected circumstance of real-time video delivery on internet can be realized.
The recently launched video meeting system is counted as one of such remarkable applications, where a user disposes an image-capturing device and a sound-detecting device at the calling end and then is able to real-time transmit video images and voice to the receiving end through the internet, meanwhile the user is able to receive video data from the receiving end too, so that the effect of a bidirectional real-time conversation can be achieved. In terms of capturing video images, the newly provided method further includes detecting the position of a speaker in a meeting by using unidirectional microphones, followed by adjusting the lens direction of an image-capturing device to capture the close-up image of the speaker, so that the personal image is zoomed-in and the resolution of the close-up image is enhanced. In this way, the expressions and the actions of the speaker can be more clearly watched at the receiving end, which further advances the practicability of a video meeting.
FIG. 1 is a diagram where three unidirectional microphones are used for adjusting a lens to capture the image of a speaker in the prior art. Referring to FIG. 1, there are three meeting attendees 100, 120 and 130 in the video meeting, and in fronts of all the meeting attendees a unidirectional microphone is respectively allocated. Whenever a meeting attendee is speaking, the unidirectional microphone in the front thereof would automatically reveal the position of the speaker and provides an image-capturing device with the position, so as to adjust the lens direction thereof and the resolution for capturing the image of the speaker. Although the position of the speaker can be precisely detected in the above-mentioned manner, however it also requires to respectively dispose a unidirectional microphone for each meeting attendee; therefore the conventional scheme cost is high.
FIG. 2 is a diagram where two unidirectional microphones are used for adjusting a lens to capture the image of a speaker in the prior art. Referring to FIG. 2, there are still three meeting attendees 100, 120 and 130 in the video meeting, while only two unidirectional microphones 140 and 150 are disposed to detect the positions of speakers. As shown by FIG. 2, the unidirectional microphone 140 has a certain detection range (between A and B) and the detection range covers the meeting attendees 110 and 120, hence, when the attendee 110 is speaking, the unidirectional microphone 140 can reveal the attendee 110 and accordingly adjust the lens direction of an image-capturing device 160 to aim the center point C of the detection range. However, the position of the center point C is not the real position of the attended 110, which causes an error of the captured image and what the user can do at the point to obtain the complete image of the attended 110 is to zoom-out the lens to get a larger captured image range. It can be seen therefrom that the above-mentioned scheme is unable to precisely detect the positions of meeting attendees; and to get a complete attendee image, the resolution must be reduced to cover a larger image, which would limit the practicability of a video meeting.