1. Technical Field
The invention relates generally to video conferencing systems, and more particularly to video conferencing systems that allow the use of precomputed panoramic three dimensional images from viewpoint of the cameras at all stations involved in the video conference.
2. Description of the Related Art
Video conferencing is becoming a popular medium of communications as is evidenced by the use of products from PictureTel, Compression Labs, and several PC software techniques such as Intel ProShare, etc. A video conferencing system includes two or more stations that are connected to one another by a communication link. Each station participating in the video conference shares one or more real time video feeds with each of the other stations. A video feed normally includes both pictures and sounds. Each participating station includes a camera that can be controlled locally with a control pad at the base and by any other participating station. The camera transmits images and audio at the best frame rate that is possible using the communication link. Typically some form of data compression is used to reduce the data to be transmitted to send the videos and sound. In some systems separate communication links are used for the audio and video to increase available bandwidth. The camera can be rotated to look up or down (elevation) and left or right (azimuth), and can zoom in to or out of a particular region. The base control pad can communicate to the local camera system either by wired or wireless communication.
The drawbacks of video conferencing as available today are jerky images and the small resolution of the video conference window. These drawbacks occur due to limited bandwidth of both POTS (Plain Old Telephone System) and ISDN phone lines and due to limited computer processing power that prohibits the use of more sophisticated compression and decompression schemes that would reduce the required bandwidth.
As a result of the above drawbacks video conferencing is less effective than it can be as a collaborative communication medium.
Existing video conferencing techniques use proprietary compression algorithms (such as Indeo, H.320) or public domain compression methods (such as CU-SeeMe and Network Video).
Indeo starts with YUV (Chrominance-Luminance) input images in which U and V subsampled 4:1 both horizontally and vertically. Indeo supports motion estimation, and uses the previous frame to predict values for the current frame and only transmits data if the difference is significant. Transform encoding is done using an 8xc3x978 Fast Slant Transform (FST) in which all operations are all shifts or adds. Quantization and run-length/entropy encoding are used for coefficients.
CU-SeeMe from Cornell University uses both intra-frame and inter-frame compression. It represents video input in 16 shades of grey using 4 bits per pixel. The image is divided into 8xc3x978 blocks of pixels for analysis. New frames are compared to previous frames, and if a block has changed significantly it is retransmitted. Blocks are retransmitted on a periodic basis to account for losses that my have occurred in the network. Transmitted data is compressed by a lossly algorithm that exploits spatial redundancy in the vertical direction.
Network Video is an Internet video conferencing tool developed at Xerox/PARC and uses both intra-frame and inter-frame compression. The current frame is compared to the previous frame and areas that have changed significantly are compressed using transform coding. Either a Discrete Cosine Transform (DCT) or a Haar wavelet transform is used. The Network Video encoder dynamically uses DCT techniques if network bandwidth is the bottleneck and Haar transform techniques if local computation is the bottleneck. The output of the transform is then quantized and run-length encoded. Higher resolution images of unchanged parts are sent periodically.
Other techniques such as MJPEG, H.261 (px64), such as CellB are similar to above, and may also be used.
One of the drawbacks of the existing systems is that as the camera at the transmitting end moves to focus on a different speaker a whole new part of the room comes into the picture thus introduces large amounts of new data in the image to be sent. A similar situation arises when the camera zooms in on a particular speaker. New areas of images come into the picture that cannot exploit temporal (inter-frame) compression. A careful look at the new image shows that the area around the speaker and behind the speaker may have changed very minimally in the conference room. Though there is little change, the inter-frame compression technique is not able to exploit the situation. As a result, motion is very jerky and so users prefer to not change focus between speakers at a fine granularity. This encourages the users to keep the camera fixed and limits the realism of video conferencing. In a real conference room all eyes move to look at the speaker and do not stay stationary. Thus we feel that for a more realistic video conference the camera must be able to move rapidly without degrading the quality of the image being shown.
The above-stated problems and related problems of the prior art are solved with the principles of the present invention, video conferencing using camera environment panoramas. Image data is communicated from a source system to a target system. At the source system, a background environment map is generated and communicated to the target system. The source system then captures a source image from a position and field of view of a camera. In addition, the background environment map is rendered according to said position and field of view of said camera to generate a background image visible for said position and field of view of said camera. A difference image is generated representing difference between the source image and the background image. Finally, the difference image and said position and field of view of the camera is communicated to the target system. At the target system, the background environment map is received from said source system. In addition, the difference image and the position and field of view of the camera is received from the source system. The background environment map is rendered according to the position and field of view of the camera to thereby generate a background image visible for the position and field of view of the camera. Finally, a target image based upon the background image and the difference image is generated for display.