The present invention generally relates to representing time-varying video data, and more specifically, to a method and system for creating, viewing and editing video data encoded to include different spatial and/or time resolutions.
Scientists often run physical simulations of time-varying data in which different parts of the simulation are performed at differing spatial and temporal resolutions. For example, in a simulation of the air flow about an airplane wing, it is useful to run the slowly-varying parts of the simulationxe2x80x94generally, the portion of space further from the wingxe2x80x94at a fairly coarse scale, both spatially and temporally, while running the more complex partsxe2x80x94say, the region of turbulence just aft of the wingxe2x80x94at a much higher resolution. The multi-grid techniques frequently used for solving large-scale problems in physics, astronomy, meteorology, and applied mathematics are a common example of this kind of computation.
However, it has been recognized that a new approach, called multiresolution video, needs to be developed for representing the time-varying data produced by such algorithms. This multiresolution video representation should provide means for capturing time-varying image data produced at multiple scales, both spatially and temporally. In addition, it should permit efficient algorithms to be used for viewing multiresolution video at arbitrary scales and speeds. For example, in a sequence depicting the flow of air about a wing, a user should be able to interactively zoom in on an area of relative turbulence, computed at an enhanced spatial resolution. Analogously, fast-changing components in a scene should be represented and viewable at a higher temporal resolution, allowing, for example, a propeller blade to be viewed in slow motion.
Moreover, multiresolution video will preferably have applications that are useful even for conventional uniresolution video. First, the representation should facilitate a variety of viewing applications, such as multiresolution playback, including motion-blurred xe2x80x9cfast-forwardxe2x80x9d and xe2x80x9creversexe2x80x9d; constant-speed viewing of video over a network with varying throughput; and an enhanced form of video xe2x80x9cshuttlingxe2x80x9d or searching. The representation should also provide a controlled degree of lossy compression, particularly in areas of the video that change little from frame to frame. Finally, the representation should support the assembly of complex multiresolution videos from either uniresolution or multiresolution xe2x80x9cvideo clip-artxe2x80x9d elements.
Multiresolution representations that have previously been proposed for images include xe2x80x9cimage pyramidsxe2x80x9d (see xe2x80x9cA Hierarchical Data Structure for Picture Processing,xe2x80x9d S. L. Tanimoto and T. Pavlidis, Computer Graphics and Image Processing, 4(2):104-119, June 1975) and xe2x80x9cMIP mapsxe2x80x9d (see xe2x80x9cPyramidal Parametrics,xe2x80x9d L. Williams, Computer Graphics (SIGGRAPH ""83 Proceedings), volume 17, pages 1-11, July 1983). A related approach uses wavelet-based representations for images as described in xe2x80x9cMultiresolution Painting and Compositing,xe2x80x9d by D. F. Berman, J. T. Bartell, and D. H. Salesin, Proceedings of SIGGRAPH ""94, Computer Graphics Proceedings, Annual Conference Series, pages 85-90, July 1994 and by K. Perlin and L. Velho in xe2x80x9cLive paint: Painting with Procedural Multiscale Textures, Proceedings of SIGGRAPH 95, Computer Graphics Proceedings, Annual Conference Series, pages 153-160, August 1995. These latter works disclose a representation that is sparse, and which supports efficient compositing operations for assembling complex frames from simpler elements, but which lack other desirable capabilities.
Several commercially available video editing systems support many of the operations of the multiresolution video that are applicable to uniresolution video. For example, Adobe Corporation""s AFTER EFFECTS(trademark) allows the user to view video segments at low resolution and to construct an edit list that is later applied to the high-resolution frames offline. Discrete Logic""s FLAME AND FLINT(trademark) systems also provide digital video compositing and many other digital editing operations on videos of arbitrary resolution. J. Swartz and B. C. Smith describe a language for manipulation of video segments in a resolution-independent fashion in xe2x80x9cA Resolution Independent Video Language,xe2x80x9d ACM Multimedia 95, pages 179-188, ACM, Addison-Wesley, November 1995. However, the input and output from all of these prior art systems is uniresolution video.
Multiresolution video also allows the user to pan and zoom to explore a flat video environment. This style of interaction is similar in spirit to two image-based environments, including Apple Computer""s QUICKTIME VR(trademark) and the xe2x80x9cplenoptic modelingxe2x80x9d system of L. McMillan and G. Bishop, as described in xe2x80x9cPlenoptic Modeling: An Image-based Rendering System,xe2x80x9d Proceedings of SIGGRAPH ""95, Computer Graphics Proceedings, Annual Conference Series, pages 39-46, August 1995. These prior art methods provide an image-based representation of an environment that surrounds the viewer. It would be desirable to combine such methods with multiresolution video to create a kind of xe2x80x9cmultiresolution video QUICKTIME VR,xe2x80x9d in which a viewer can investigate a panoramic environment by panning and zooming, with the environment changing in time and having different amounts of detail in different locations.
Furthermore, it would be desirable to provide for a simple form of lossy compression applicable to the multiresolution video. Video compression is a heavily studied area. MPEG and Apple Corporation""s QUICKTIME(trademark) are two industry standards. Other techniques based on multiscale transforms, as discussed by A. S. Lewis and G. Knowles in xe2x80x9cVideo Compression Using 3D Wavelet Transforms,xe2x80x9d Electronics Letters, 26(6):396-398, Mar. 15, 1990, and by A. N. Netravali and B. G. Haskell in Digital Pictures, Plenum Press, New York, 1988, might be adapted to work for multiresolution video.
In accord with the present invention, a method is defined for storing video data that comprise multiple frames so as to provide independent image resolution and time resolution when displaying the video data. The method includes the step of providing a data structure for storing the video data in a memory medium. A flow of time for the video data is encoded in a first portion of the data structure, and a spatial decomposition of the multiple frames of the video data is encoded in a second portion of the data structure that is linked to the first portion of the data structure. The first and second portions of the data structure are decoupled sufficiently from each other so as to enable the video data to be read from the memory medium and displayed with separately selectively variable spatial resolutions and temporal resolutions. Thus, the spatial resolution is generally selectively variable independent of the temporal resolution, and the temporal resolution is generally selectively variable independent of the spatial resolution.
The method preferably further includes the step of writing the video data in the data structure to the memory medium for storage. The amount of storage required to store the video data at a selected spatial resolution and a selected temporal resolution is substantially dependent upon the resolution.
The method also may include the step of transmitting the video data in the data structure over a communication link. At least one of the spatial resolution and the temporal resolution of the video data being transmitted is then automatically variable to fit within an available bandwidth of the communication link. Therefore, if the available bandwidth of the communication link varies during transmission of the video data, the method may include the step of automatically varying at least one of the spatial resolution and the temporal resolution in accord with the varying bandwidth of the communication link.
In addition, the method may include the step of displaying the video data stored on the medium in the data structure on a display device having a limited resolution, and automatically varying the spatial resolution of the video data being played to conform to the limited resolution of the display device.
A fast forward of the video data stored in the data structure can be provided by varying the temporal resolution of the video data displayed in a forward play direction. Similarly, a fast reverse of the video data stored in the data structure can be provided by varying the temporal resolution of the video data displayed in a reverse play direction. Searching of the video data stored in the data structure is enabled by varying the temporal resolution of the video data when displayed, so that frames of the video data are displayed at a rate substantially faster than normal.
Video data that are at a relatively higher resolution are resampled to produce additional video data having either a relatively lower temporal resolution or a lower spatial resolution, for storage in the data structure. It should also be apparent that the video data stored in the data structure can have a dynamically varying spatial resolution and a dynamically varying temporal resolution.
Preferably, the data structure comprises a sparse binary tree for encoding the flow of time and sparse quadtrees for encoding the spatial decomposition of frames of the video data. The method may also include the step of enabling lossy compression of the data structure.