1. Field of the Invention
The present invention relates to a display control system for videoconference terminals, and more particularly, to a display control system for videoconference terminals which is capable of displaying natural and proper images of participants in a videoconference.
2. Description of the Related Art
Videoconference systems provide bidirectional video and audio communication between two (or more) distant parties via a video coder and decoder. A simple but important thing to note when using a videoconference system is that the local video camera must be properly positioned so that a person participating in the conference can be seen on a terminal screen at the distant site. Such camera adjustment should be done before the videoconference begins and also in the middle of the session. Otherwise, the video reproduced at the receiving end could sometimes show only a part of the participant's body or, in the worst case, nothing but a background wall, due to improper camera settings. The lack of visual images in videoconferencing will make it difficult for the participating parties to feel a sense of togetherness, hindering smooth proceeding of even a well-prepared videoconference. The following will provide further detail of this problem in capturing of participant's images.
Most videoconference terminals used in today's typical videoconference systems provide a feature of simultaneous local and remote video views, allowing users to monitor the image of themselves appearing in a small window as part of a monitor screen. In such a videoconference terminal, a conference participant at the sending end should check the monitor screen periodically to confirm that his video image is being properly captured and transmitted to the distant receiving end. If it was off the local video view window, he/she has to adjust the camera angle by hand or some other means. Therefore, he/she must always pay attention to the local video view window and sometimes leave his/her seat to make a correction of camera angle. This will certainly interrupt the discussion and disturb the smooth proceeding of the videoconference.
Some systems provide remote camera control capabilities with zoom, pan, and tilt functions integrated into the camera mechanisms, which allow the local site to control the camera located at the distant site. This feature may be useful, but it still means that someone should take care of such a remote camera control. In a situation where the participant has to serve as a camera operator, he/she will not be able to concentrate on the discussion. Therefore, the remote camera control cannot solve the problem.
As described above, conventional videoconference systems require an operator who will watch the local video view and take care of the camera angles in order to keep showing a correct picture of the local cite (i.e., the sending end) to other participants at the distant site (i.e., the receiving end). To eliminate such assistance of a human operator, the following videoconference system is being proposed. This proposed system employs a wide-angle high-resolution camera set up at a fixed position to take a long shot that covers the entire conference room including a participant. The participant's image, as part of the global picture captured by the wide-angle camera, is identified and cut out to obtain his/her portrait image of a fixed size. The clipped picture is delivered to the distant cite, after being applied some appropriate scaling operations so that it will fit into a monitor screen at the distant cite. The system automatically tracks any movement of the participant if it is within the sight of the fixed wide-angle camera, and will display his or her picture in the center of the remote monitor screen.
This improved videoconference system with automatic tracking capabilities, however, has a problem in its image identification algorithm as described later. Also, even if the participant's picture has been successfully obtained, the picture will suffer from a side effect of the automatic tracking function. More specifically, the portrait picture is cut out of the captured image, with the focus always placed right on the participant wherever he/she may move. This leads to a difficulty for the viewers to see the entire circumstances including the participant's movement and the background scenes. Further, the clipped picture is accompanied by such a side effect that it looks as if the person in the picture were fixed but his/her background were moving, as in the case of moving the camera too quickly to follow. This effect will confuse the viewers and destroy clarity of the reproduced picture.
Regarding the subject identification algorithm mentioned above, there is a well-known technique that first captures a background picture alone with no participants included and uses it as a reference image to compare with each camera shot taken in a conference session, thereby identifying and locating the subjects. Trouble is, however, this algorithm can work only on the assumption that the background image will never vary with time, which also means that it is not allowed to change the camera angle or location during the session.
There is proposed another subject identification algorithm, which locates a subject by calculating the differences between one video frame and the next, or the magnitude of interframe motion. However, the picture will suffer from such noises that are introduced by some slight differences between frames or a motion in the background image. It is therefore difficult to detect the subject with accuracy in a reliable way.