1. Field of the Invention
The present invention relates to a teleconferencing system, a camera controller for a teleconferencing system, and a camera control method for a teleconferencing system, and more particularly to a controller that controls the camera imaging angle and picture angle so as to properly capture the picture of a speaker in a meeting.
2. Related Art
In the past, a controller has been developed whereby a plurality of microphones were provided in a teleconferencing terminal, and automatic selection was performed of a speaker from the participants in the teleconference, so that a camera is aimed at this speaker.
For example, in a method disclosed in the Japanese unexamined patent publication (KOKAI) No.5-122689, microphones are installed for each participant, and camera is aimed at the microphone with the maximum sound level, so as to capture a picture of a speaker.
In a method disclosed in the Japanese unexamined patent publication (KOKAI) No.7-140527, the direction from which a sound is heard is detected from the phase difference in voices input from a plurality of microphones.
Another method that has been developed is a method whereby a person is detected from an imaged picked up by a camera, and the camera is pointed in that direction. For example, in the Japanese unexamined patent publication (KOKAI) No.4-234284, a frame differencing between sequential images is used to measure the distribution of movement in the horizontal direction, thereby detecting the direction of a person in the picture.
In the Japanese unexamined patent publication (KOKAI) No.5-268599, a contour is detected from an image, and matching is performed with the shape of a person, so as to detect a person.
A method for reliably imaging a speaker is disclosed in the Japanese unexamined patent publication (KOKAI) No.5-244587, for example, whereby a separate wide-angle camera is provided.
In the Japanese unexamined patent publication (KOKAI) No.8-29652, there is a proposed method whereby, after detection of a person by his or her voice, the contour of the person is detected from an image, and the aiming of the camera is corrected accordingly.
In the above-noted prior art, however, in which a plurality of microphones is used and the camera is aimed at only the one from which the sound level is maximum, is only possible in the case in which the directions of the microphone and the speaker coincide, and if the direction of the speaker is shifted from the direction of the microphone, it is not possible to capture the image of the speaker in the center.
Additionally, in the method using the phase difference, there is a large error, and it is difficult to position the speaker in the center of the camera.
In other methods of capturing the speaker in the center of the camera using an image, the methods are not suitable for cases in which the speaker is not already somewhere within the image, and in a method using a separately provided wide-angle camera to capture the speaker, the overall cost of the teleconferencing system becomes high.
In a proposed method whereby correction is done by an image after a rough detection is done by voices, a contour is determined within an image, and a person is detected therefrom. However, the method of using a contour has the problem of requiring a large amount of calculation if a large number of objects are present within the meeting room.
Additionally, this method does not consider a means for capturing a person at a proper picture angle.
Accordingly, it is an object of the present invention to improve on the drawbacks of the prior art as noted above, by providing a teleconferencing system, which uses a simple configuration to aims a camera, which serves as an imaging means, reliably and accurately at a speaker, based voice information of the speech of the speaker, and which also enables capture of a picture of the speaker at a proper picture angle.
To achieve the above-noted objects, the present invention adopts the following base technical constitution.
Specifically, a first aspect of the present invention is a teleconferencing system having a plurality of sound-collection means, at least one speaker-imaging means, an image-display means, and an imaging control means, which, based on voice direction information of a speaker obtained from the sound-collection means, changes the imaging direction of the imaging means that images the speaker, wherein the imaging control means is controlled so as to direct the imaging direction of the speaker-imaging means toward the direction of a speaker predicted by the sound-collection means, and wherein the imaging control means is configured so that movement pixels are extracted from the captured image, and a distribution of the movement pixels is determined, so as to identify the direction of the speaker within the image, and so that, based on the direction information of the speaker, the speaker is displayed in a prescribed position within the image area.
A second aspect of the present invention is a teleconferencing system having a plurality of sound-collection means, a speaker-imaging means, which images a speaker, a speaker direction detection means, which, based on information from the sound-collection means, predicts the direction of a speaker, a first imaging control means, which, based on information of the speaker direction detection means, changes the facing direction of the speaker-imaging means, an image-display means, which, in response to a control signal of the first imaging control means, displays a captured image of the speaker-imaging means caused to be faced to a prescribed direction, a movement pixel detection means, which detects movement pixels from the captured image, a movement distribution measurement means, which measures the distribution of movement from the movement pixels detected by the movement pixel detection means, a speaker position establishing means, which, based on the measurement results from the movement distribution measurement means, establishes the position of a speaker in the image, and a second imaging control means, which, based on information of the speaker position establishing means, performs further control of the facing direction of the imaging means.
A third aspect of the present invention is a control method for a speaker-imaging means in a teleconferencing system having a plurality of sound-collection means, at least one speaker-imaging means, and an imaging control means, which, based on voice direction information of a speaker obtained from the sound-collection means, charges the imaging direction of the speaker-imaging means that images the speaker, this method having
a first step of predicting the direction of a speaker, from speaker voice sound information collected from each of the sound-collection means,
a second step of, based on the speaker direction information predicted by the first step, causing the first imaging control means to drive the speaker-imaging means, so as to direct the imaging direction axis of the speaker-imaging means toward the predicted direction of the speaker,
a third step of displaying an image captured by the speaker-imaging means on an image display apparatus,
a fourth step of extracting movement pixel information from the captured image information,
a fifth step of calculating a movement distribution from the extracted movement pixel information,
a sixth step of establishing the position of a speaker in the captured image, from the movement distribution information, and
a seventh 7xe2x80x2 step of, from the position information of the speaker within the captured image, the second first imaging control means adjusting a zoom mechanism of the speaker-imaging means so as to adjust the size of the speaker in the captured image.
By adopting the above-noted technical constitution, a teleconferencing system and a camera controller and camera control method for a teleconferencing system according to the present invention has a speaker direction detection means, which detects the direction of a speaker from a phase difference input into each one of a plurality of the microphones or from the voice levels detected by the plurality of microphones, and an imaging control means, which directs a camera to the detected direction, wherein a moving part of a picture picked up by a camera directed at a speaker by means of his or her voice is detected, a movement distribution measurement means measuring the movement distributions thereof in the horizontal and vertical directions, and the position and size of a person being detected by a speaker position establishing means, from the horizontal-direction and vertical-direction movement distribution. Even if the speaker is not captured within the image, the imaging control means can be moved so that the speaker is captured in the center part thereof, with a proper size.