The present invention relates to a video conference system for properly imaging a speaker in a conference by controlling the imaging direction and field angle of the camera view.
In a conventional video conference, a motor drive camera whose imaging direction and field angle of the camera view can be adjusted to generate images of a picture of a participant is used. Conventionally, in a video conference in which plural persons participate, the imaging direction and field angle of the camera view are manually selected to transmit a picture or the like of a speaker to remote participants. With a conventional camera control interface, an operator operates buttons to designate change amounts associated with the imaging direction and field angle of the camera view. Since the operator cannot perform such operation intuitively, it takes much time to correctly direct the camera to a speaker, interfering with the progress of a conference.
In order to solve this problem, controllers for automatically detecting a speaker and directing a camera toward the speaker have been proposed. For example, Japanese Patent Laid-Open No. 5-122689 (reference 1) discloses a video conference system which detects a microphone of plural microphones which exhibits the highest voice level and directs a camera toward the detected microphone. Japanese Patent Laid-Open No. 7-140527 (reference 2) discloses a camera imaging controller which detects the direction in which voices are heard on the basis of the differences in phase between voices input to a microphone. According to references 1 and 2, a speaker is detected on the basis of the direction of speech, and the direction and the like of a camera are controlled to generate images of a speaker picture.
In addition, techniques of capturing a participant with a camera and detecting the participant from the resultant image. For example, Japanese Patent Laid-Open No. 8-298652 (reference 3) discloses a camera direction controller for a video conference terminal, which detects the contour of a human figure from a captured image upon directing a camera in the direction in which speech is detected, and correcting the direction of the camera. Japanese Patent Laid-Open No. 11-8844 (reference 4) discloses an image sensing apparatus controller which displays the movable range of a camera, in which panning and tilting can be performed, as a panoramic image, and allows an operator to designate an arbitrary area within the panoramic image, thereby easily directing a camera toward the designated area.
In the methods proposed by references 1 and 2, in which the direction of a speaker is detected from speech, a directional error is large, and hence it is difficult to control the camera to set the speaker in the center of a frame. In addition, since the size of a speaker cannot be detected, a proper field angle of the camera view cannot be set for the speaker.
In the method proposed by reference 3, in which detection is performed by using images, participants other than a speaker are also detected. In the conventional methods of detecting a speaker by using speech and images and setting a camera in the detected direction, therefore, the direction and field angle of the camera view must be further corrected manually.
In the method proposed by reference 4, the direction of a camera can be controlled by designating an area where generating images is to be performed within a panoramic image. In this method, however, even in a case where people do not move much once they are seated as in a video conference, a shooting area must be designated on a panoramic image every time the direction of the camera is changed, resulting in cumbersome operation.
It is an object of the present invention to provide a video conference system which can easily designate a participant to be captured in a video conference.
It is another object of the present invention to provide a video conference system which can properly generate images of a participant.
In order to achieve the above objects, according to the present invention, there is provided a video conference system comprising a camera whose imaging direction and field angle of the view can be changed, camera driving means for controlling the imaging direction and field angle of the camera view in accordance with camera parameters, human figure extracting means for extracting a human figure picture of each participant from a picture obtained by generating an image of an entire conference room with the camera, and calculating camera parameters associated with the imaging direction and field angle of the camera view with respect to each participant on the basis of the extracted human figure picture, human figure picture storage section for storing the human figure picture extracted by the human figure extracting means, camera parameter storage means for storing the camera parameters calculated by the human figure extracting means, and human figure instructing means for reading out from the camera parameter storage section camera parameters corresponding to a human figure picture selected as a shooting target from the human figure pictures stored in the human figure picture storage means, and outputting the camera parameters to the camera driving means.