Recently, a network conference system is widely used. In the network conference system, a plurality of terminals (each having a display, a speaker, a camera, a microphone) is connected via internetworking, the Internet, or ISDN (Integrated Services Digital Network), and a plurality of users mutually confers using each terminal. This network conference system operates with following basic principle. First, a first user's video and speech are input via a camera and a microphone of a first terminal of the first user. Next, input information is output via a display and a speaker of a second terminal of a second user, i.e., the input information is informed to the second user. By performing this operation mutually, communication between the first user and the second user can be realized. In this case, a system which one virtual conference space is displayed on a display, or a system which a video of each participant is displayed with a tile shape, exists.
On the other hand, in a communication behavior, importance of non-verbal communication using non-verbal information has been recognized. Especially, in a remote communication such as the network conference, it is pointed that much non-verbal information is dropped. By drop of the non-verbal information, for example, a participant cannot well control turn of utterance (turn-taking). As a result, the conversation cannot proceed smoothly, and a quality of the conversation is affected badly.
Accordingly, as to the network conference system of prior art, in order to recover the non-verbal information based on an eye contact, following two trials are performed. As a first trial, a technique to effectively exclude a parallax effect, which is caused by a situation that a center of the camera is different from a position of another participant to be displayed, is proposed (For example, JP-A H08-195945 (Kokai) (Patent reference 1), JP-A H11-355804 (Kokai) (Patent reference 2)). However, in this method, a problem how to represent information (For example, “who is looking at whom”) among a plurality of users cannot be solved, and this method is not useful for the conference.
As to a second trial, a target required for an eye contact is clearly determined, and the eye contact is quickly realized. For example, in JP No. 3292248 (Patent reference 3), in case that a plurality of users utilizes the conference system, a technique to represent the eye contact by changing a camera position and a layout of each user, is proposed. Furthermore, in JP-A H08-237629 (Kokai) (Patent reference 4), on condition that the user's head is modeled, in the same way as the virtual camera of the Patent reference 2, a technique to exclude the parallax effect is proposed. Additionally, as to the user looking at another user, a method to rotate a head of the user's model as 90 degree along a direction of another user is proposed. Furthermore, in JP-A H08-256316 (Kokai) (Patent reference 5), a technique to display a video based on a distance between users (displayed) and realize a user's gaze to a material is proposed.
With regard to importance of non-verbal communication in the communication behavior, in a remote communication such as the network conference, much non-verbal information is dropped. Especially, in the non-verbal information dropped, the most important one is gaze information representing that “In he/she looking at me?”, “What is he/she looking at?” and so on.
As to above problem, in the method of the Patent reference 3, a complicated optical system is used. Accordingly, this method has a problem that cheap component cannot be realized. Furthermore, the video display is not represented based on a distance among a plurality of users (displayed). As a result, the gaze information is not always accurate, and unnatural.
In the method of the Patent reference 4, the cheap component is possible. However, the video display is not represented based on a distance among a plurality of users (displayed). Furthermore, in the method of the Patent reference 5, a plurality of videos previously acquired is displayed by switching. Accordingly, a large number of videos for reference need be previously recorded, and distributed to each terminal. Additionally, a user's status in actual conversation is not reflected to the video. As a result, the facial expression and a variation of the user's clothes cannot be reappeared, and unnatural video is displayed.