1. Technical Field
The invention is related to compressing and decompressing video, and more particularly to a system and process for compressing and decompressing multiple, layered, video streams employing spatial and temporal encoding.
2. Background Art
For several years now, viewers of TV commercials and feature films have been seeing the “freeze frame” effect used to create the illusion of stopping time and changing the camera viewpoint. The earliest commercials were produced by using a film-based system, which rapidly jumped between different still cameras arrayed along a rail to give the illusion of moving through a frozen slice of time.
When it first appeared, the effect was fresh and looked spectacular, and soon it was being emulated in many productions, the most famous of which is probably the “bullet time” effects seen in the movie entitled “The Matrix”. Unfortunately, this effect is a one-time, pre-planned affair. The viewpoint trajectory is planned ahead of time, and many man hours are expended to produce the desired interpolated views. Newer systems are based on video camera arrays, but still rely on having many cameras to avoid software view interpolation.
Thus, existing systems would not allow a user to interactively change to any desired viewpoint while watching a dynamic image-based scene. Most of the work on image-based rendering (IBR) in the past involves rendering static scenes, with two of the best-known techniques being Light Field Rendering [5] and the Lumigraph [3]. Their success in high quality rendering stems from the use of a large number of sampled images and has inspired a large body of work in the field. One exciting potential extension of this groundbreaking work involves interactively controlling viewpoint while watching a video. The ability of a user to interactively control the viewpoint of a video enhances the viewing experience considerably, enabling such diverse applications as new viewpoint instant replays, changing the point of view in dramas, and creating “freeze frame” visual effects at will.
However, extending IBR to dynamic scenes is not trivial because of, among other things, the difficulty (and cost) of synchronizing so many cameras and acquiring the images. One of the earliest attempts at capturing dynamic scenes was Kanade et al.'s Virtualized Reality system [4], which involved 51 cameras arranged around a 5-meter geodesic dome. Carranza et al. [1] used seven synchronized cameras distributed around a room looking towards its center to capture 3D human motion. Yang et al. [7] designed an 8×8 grid of cameras (each 320×240) for capturing a dynamic scene.
Compressing the video data to a workable size for transmission or storage, and then decompressing the compressed data in an efficient and quick manner with acceptable quality, is also a difficult problem. Compression is needed as even if only a few cameras are employed in capturing the video data, the amount of data is extremely large (e.g., on the order of 800 MB per second for 8 cameras at 15 fps). Essentially, the amount of data involved is too large to efficiently transmit over a computer network given current typical bandwidth resources. Further, storage of the data is problematic if using currently popular storage media. For example, the storage capability of a current DVD could be easily exceeded. Thus, compression of the video data is needed to make distribution practical. In addition, the compression scheme should allow the data to be recovered in substantially real-time in order to support the rendering of the captured scene from a viewer-selected viewpoint. Current video compression techniques can be employed but would not be efficient enough to provide the necessary compression ratio to facilitate distribution of the video data or its substantially real-time decompression. One recent attempt at compressing video streams from multiple cameras involved a proofing of concept for storing dynamic light fields. Namely, Wilburn et al. [6] demonstrated that it is possible to synchronize six video cameras, and compress and store all the image data in real time. They have since hooked up 128 cameras. Chang et al. [2] is another example of compressing video streams from multiple cameras using a light field encoding approach. In another attempt, Ziegler et al. [8] exploited the high degree of redundancy inherent in multiple video streams depicting the same dynamic scene, especially as between the streams, to compress the data using a texture domain approach.
The present invention tackles this problem of compression and decompressing multiple video streams of the same dynamic scene in a different and efficient manner.
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. A listing of references including the publications corresponding to each designator can be found at the end of the Detailed Description section.