1. Field of the Invention
The present invention relates generally to teleconferencing systems having video camera that track an image, and more particularly, to a system that continuously tracks a sound emitting object and transmits an image of the object and sound generated thereby.
2. Description of Related Art
Teleconferencing or video-conferencing is an increasingly prevalent communication tool, particularly in business. Teleconferencing, video teleconferencing, video-conferencing, and video-phone each refer to the substantially simultaneous transmission of audio and visual signals between two or more remote locations for communication between the two locations. Teleconferencing enables conferees at two or remote sites to interact as if they were together in a conference room. The increasing sophistication level of the infrastructure surrounding teleconferencing, such as increasing data transmission rates due to optical fiber lines for example, have helped to provide an suitable conference environment for the conferees.
Since there are often a number of conferees at each remote site, it is desirable to have a video transmitting device track each of the conferees, such as when they are speaking during the conference, just as one's eyes follow different persons when they are speaking. A number of systems have been disclosed in the prior art for tracking objects, such as conferees, for the purposes of a teleconference for example. The most prevalent systems for object tracking are visual tracking systems.
A disadvantage of visual tracking, and teleconferencing in general, is that conferees at each location typically look at the video camera transmitting their actions and not at the screen displaying the conferees at the remote location since the video camera is often adjacent to the screen. Thus conferees at each of the locations are gazing in directions slightly away from the screen, which is somewhat awkward for participants of the teleconference. Another disadvantage of visual tracking systems is that since people usually direct their attention by first hearing a sound, then moving their head and directing their eyes toward the sound, visual tracking is an unnatural transition between different speakers at a given location.
One visual tracking system is disclosed in U.S. Pat. No. 5,438,357, to McNelley. Disclosed therein is a system for teleconferencing that allows for natural eye contact between conferees. The system includes at least two terminals that each comprise a screen for displaying an image and a video camera for transmitting an image of a conferee to the remote screen. Audio communication means such as microphones and speakers are also provided to enable conferees to hear and speak to each other. The video camera at each location is located above the screen, and thus above eye level. Image manipulation is used to manipulate the image of a conferee and to redirect the apparent direction of the conferees gaze so that it appears that they are looking directly into the screen and at the conferees at the remote location. Image manipulation is further used therein to simulate zooming, tilting, and panning of the camera.
U.S. Pat. No. 5,500,671, to Andersson et al., discloses a video conference system that provides eye contact and a sense a presence to a plurality of conference participants located in respectively remotely-sited conference rooms. Each room contains at least one video telephone that includes a video camera and an image receiver for displaying image frames of at least one remote conferee. The image receiver, video camera, and the eyes of the local conferee define a parallax angle. A frame generating system is further provided for analyzing local conferee image frames, responsive to video signals, and generates a corresponding sequence of parallax-compensated frames. A signal indicative of each parallax-compensated frame is transmitted to a corresponding image receiver for providing apparent eye contact between each local conferee and the displayed image of a corresponding remote conferee. When there are more than three conferees, each input image is additionally analyzed for head position, and the head position is reoriented by the frame generating system as necessary to provide a sense of presence.
Another visual based image tracking system is disclosed in U.S. Pat. No. 5,434,617, to Bianchi. The disclosed system utilizes methodology and circuitry for automatically effecting electronic camera movement to track and display the location of a moving object, such as a person presenting a talk to an audience. The system includes a fixed spotting camera for capturing a field of view and a moving tracking camera with pan, tilt, zoom, and focus functions driven to the present location of the moving object. Information for driving the tracking camera is obtained with reference to the pixel difference between a current image and a previous image within the field of view. A tracking algorithm computes the information necessary to drive the tracking camera from these pixel differences as well as data relative to the field of view of the spotting camera and the present tracking camera position.
U.S. Pat. No. 5,418,595, to Iwasaki et al., is directed to a camera having a subject tracking function and method therefor. The disclosed camera is provided with a light measurement unit which measures light by dividing the subject field into multiple regions and outputs multiple light measurement data relating the brightness of the subject field. A subject tracking unit tracks the subject by detecting the movement of the subject using the output from the light measurement unit and a focal point detecting unit that includes multiple focal point detection regions within the subject field and detects the status of the focal point adjustment unit if the focal point of a photographic lens is manually adjusted. In use, if the focal point of the photographic leans is adjusted by the subject tracking unit and at least one of the multiple focal point detection regions is in focus, the subject tracking unit tracks the subject position in the focal point detection region that is in focus as the new subject.
Additional visual based camera tracking systems are disclosed in U.S. Pat. No. 5,473,369, to Abe, which is directed to an object tracking apparatus; U.S. Pat. No. 5,430,809, to Tomitaka, discloses a human face tracking system; and U.S. Pat. No. 5,572,317 to Parker et al., which discloses a remote-controlled tracking system for tracking a remote control unit and positioning and operating a camera and method.
Another means for enabling a camera to track conferees in a teleconference is detection of their motions. U.S. Pat. No. 5,552,823, to Kageyama, discloses a picture processing apparatus with object tracking. The disclosed apparatus first detects a distribution of motion vectors within a tracking vector detection area. The apparatus then detects a most characteristic motion vector. In a second embodiment, a power of components in a high frequency band is detected per unit of predetermined blocks. The difference data between picture data of motion pictures of the present frame and the previous frame is obtained. The power of the difference data is detected per unit of predetermined blocks, and an edge having movement is detected on the basis of the detection result.
A disadvantage of the camera tracking devices disclosed in the above enumerated prior art references, along with teleconferencing systems in general, is that they fail to utilize sound as an effective camera tracking means. Since people usually direct their attention by first hearing a sound, then moving their head and directing their eyes toward the sound, it would be advantageous to provide a camera tracking system utilizing this process.