People at distributed locations commonly hold meetings through videoconferencing. In a typical scenario, a camera and microphone are placed in a room at each location to transmit video and audio to the other location. The camera and microphone are generally connected to a computing system running videoconferencing software responsible for transmitting the data over telephone lines, the Internet, or other network to a remote location. A computing system at the remote location then receives the video and audio data and delivers them to the conference participants using a display screen and speakers.
A camera's orientation is described in terms of azimuth and elevation, and its distance from an object is described in terms of range. Azimuth refers to the rotational angle around the table that the camera is facing. Magnetic north is designated as zero degrees, such that if the camera is facing magnetic north, the azimuth is zero degrees. The action of changing the camera's azimuth is referred to as panning. Elevation is the angle up or down from level that the camera is facing. A camera that is facing level has an elevation of zero degrees. Angles above level are represented with positive values, while angles below level are represented with negative values. The action of changing the camera's elevation is referred to as tilting. Range refers to the distance from the camera to the object whose image is to be captured. The action of making an object appear larger or smaller in an image is referred to as zooming.
In early videoconferencing systems, the camera was stationary and only conference participants that were seated directly in front of the camera could be seen at the remote location. Some videoconferencing systems added the ability to manually pan, tilt, and zoom the camera. Later videoconferencing systems automatically panned and tilted the camera to allow participants at a remote location to see a speaking participant wherever she was located in the room (e.g., using sound and/or vision techniques). Some modern videoconferencing systems use audio from the microphone to position the camera and estimate the distance of the speaking participant from the camera based on volume, however this often results in choosing an incorrect speaker or a disproportionate image size since, for example, some participants speak more loudly than others and other noises in the room such as reflections off of objects may confuse the system.
Even with moveable cameras, it is often difficult to see all of the conference participants. Conference participants are often seated around rectangular tables. Participants at the ends of the table that are farther from the camera appear small and with less detail compared with participants at the sides of the table and closer to the camera. It is distracting for conference participants to have to see some conference participants that may fill an entire display because they are close to the camera and others that may fill only a small portion because they are far away from the camera.