1. Field of Invention
The invention relates to a method and apparatus for efficiently and seamlessly synthesizing a plurality of dynamically configurable compressed views of a scene from compressed imagery of the scene captured by a plurality of video sources.
2. Description of Related Art
High resolution mega-pixel video cameras provide a cost effective way of providing high quality visual coverage of a large area of interest. To achieve detailed coverage of a large area of interest, such as a sports arena, railway station, a large facility or a building may require several such cameras possibly generating several mega-pixels of data at video-rate. Multiple users have a need to access this data with different resolutions, field of view coverage and formats.
Despite availability of such high quality, large coverage, high-resolution imagery, existing systems are limited in their ability to simultaneously synthesize multiple views that can be independently controlled to provide configurable field of view and resolution seamlessly across the covered venue. It is desirable that each user, independent of the others, be able to seamlessly navigate the covered area similar in experience to a having a Pan Tilt Zoom camera solely to itself.
For instance, multiple security guards may want to view or assess different parts of the buildings, or follow different intruders in a multi-prong intrusion. Further, a mobile guard may want to monitor the situation on a handheld device supporting CIF resolution, while a guard in the control room may monitor on a 1080 p HD display.
In a sports arena, multiple TV screens may be displaying different views of the play, as determined by the production director assigned to that display. Further, some of the legacy displays may be Standard Definition, while others may be High Definition with resolutions ranging from 720 i to 1080 p.
In an internet video conferencing scenario, multiple participants generate video at different resolutions and have different viewing resolution and bandwidth constraints. Each user may individually desire to view either all, a subset or just one of the participants on their display. Further, a smart phone user on a 3G network may have very different bandwidth and resolution constraints than a user on a high-speed internet network using a large TV display.
Traditional methods and systems for personalized interactive visualization experience are computationally prohibitive and unable to support a plurality of concurrent users with user-specific viewing requirements and constraints. The computational cost has dramatically increased especially due to high-resolution mega-pixel video sources, while maintaining need to support low resolution to high resolution displays. Traditional methods of decompressing all source video to raw format, processing raw video, synthesizing user-specific view and recompressing them is cost prohibitive and no longer sustainable when required to support high-resolution video sources and displays.
Several mega-pixel camera manufactures provide systems that support a few independent views (typically up to 4) that can be simultaneously and independently controlled for field of view and resolution. The synthesized views are, however, limited to within the field of view of each camera. The users are unable to see a view that may be partially covered by two separate cameras. The visualization is therefore not seamless; it is limited to one camera and limited to a few users.
On the other hand, several legacy video visualization system support 2D or 3D stitching of videos from multiple standard resolution cameras in a geographic reference frame to provide a seamless navigation across multiple cameras. U.S. Pat. No. 7,522,186 describes a system for overlaying imagery from multiple fixed cameras onto a 3D textured model of a scene. These systems perform computationally prohibitive full decompression followed by alignment, and view synthesis in the image domain using special graphics card for visualization. The high computational cost of these image processing steps significantly impedes their ability to synthesize plurality of views in a scalable and cost-effective way. As a result, the system is limited to synthesizing only one view, and thus supports only one user. Every additional user requires its own complete visualization system. This approach is not scalable to large number of users due to limitations of cost of each system and bandwidth requirement to transfer possibly 10's of mega-pixels of imagery to every such system at video rate.
U.S. Pat. Nos. 5,359,363 and 7,450,165 describe devices for generating user-specific views in raw image format. The devices are limited to support a single uncompressed camera source and the patent does not address scalability of processing for plurality of users, and do not support compressed video sources.
U.S. Pat. No. 6,075,905 describes a method for stitching a sequence of raw images to construct a mosaic. The patent does not address generating a user-specific view of desired resolution and field of view characteristics, and does not address scalability of processing for generating a plurality of user-specific views.
Several methods have been described in literature for manipulating imagery in the compressed domain for better performance. U.S. Pat. No. 7,680,348 and references therein describe fast compressed domain methods for adjusting video resolution and region of interest for JPEG 2000 compressed imagery. The patent and references do not address compositing video from a plurality of video sources and processing architecture for generating a plurality of user-specific views.
Consequently, there remains a need in the art for a scalable method and apparatus that supports a plurality of concurrent users and provides personalized control to each of the concurrent users for interactive visualization across a plurality of video sources with support for output user-specific resolutions and bandwidth constraints.