1. Technical Field
The invention is related to a system and process for viewing panoramic videos, and more particularly to such a system and process that allows a user to control what portion of the scene depicted by the panoramic video is viewed, as well as letting the user to select what video should be played, choose when to play or pause the video, and to specify what temporal part of the video is played.
2. Background Art
A panoramic video is a video made up of a sequence of panoramic frames depicting a surrounding scene. Ideally, the panoramic video makes available a seamless, 360 degree, view of this scene. In this way, a person viewing the panoramic video can select different portions of the scene to view on a real-time basis. In other words, a person viewing the panoramic video on the proper viewer can electronically steer his or her way around in the scene as the video is playing.
A number of different systems for generating and viewing panoramic videos have been previously developed. For the most part, the current generating systems employ a mirror arrangement to capture the surrounding scene. For example, one existing system, referred to as a catadioptric omnidirectional camera system, incorporates mirrors to enhance the field of view of a single camera. Essentially, this system, which is described in a technical report entitled xe2x80x9cCatadioptric Omnidirectional Cameraxe2x80x9d (Shree K. Nayar, Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Puerto Rico, June 1997), uses a camera that images a hemispherical mirror to generate an panoramic still images with a 360xc2x0xc3x97210xc2x0 field of view. Another similar mirror-based system unwarps a spherically distorted video produced by the mirror-and-camera rig into a rectangular video stream then encodes it using standard streaming authoring tools. The person viewing a video produced via this system sees a sub-region of the scene captured in the panoramic video and can pan within the scene. While these mirror-based single camera systems are capable of producing convincing panoramic stills and video, they can suffer from relatively low resolution when viewed and require a fairly complex camera rig to generate owing to the mirror arrangements.
Another current panoramic video system that attempts to overcome the resolution and complexity problems, foregoes the use of a mirror, and employs a multiple camera head instead to generate the panoramic video. The head consists of six cameras mounted on the six faces of a 2-inch cube, resulting in a 360xc2x0xc3x97360xc2x0 field of view. The system also provides post-processing software to stitch the video streams from the individual cameras into a panorama. This multi-camera system has higher resolution than the catadioptric systems described above, but has the disadvantage of an expensive stitching stage and parallax artifacts due to the cameras not sharing a common center of projection.
One other system of note employs both a mirror arrangement and multiple cameras in an attempt to achieve a higher resolution without the stitching and parallax problems of the non-catadioptric, multi-camera system just described. Essentially, this system uses the mirror arrangement to create a common effective viewpoint for the cameras. While this system improves the resolution and reduces the aforementioned stitching and parallax problems, it still requires the use of a complex mirror-and-camera rig to generate the panoramic videos.
The present invention relates to a viewing system and process suited for playing panoramic videos such as those produced by the foregoing systems.
The present invention relates to a system and process for viewing a panoramic video. The primary components of the panoramic video viewer according to the present invention include a decoder module. The purpose of this module is to input incoming encoded panoramic video data and to output a decoded version thereof. In the context of a panoramic video, the incoming video data will typically represent multiple frames of the video, or portions thereof (as will be discussed later). The incoming data may be provided over a network and originate from a server, or it may simply be read from a storage media, such as a hard drive, CD or DVD. The encoded video data might include an audio component, as well. Further, the incoming video data may be uncompressed or compressed.
In the case where the decoder will be handling compressed data with an audio component embedded therein, the following sub-architecture could be employed. The incoming data is first input into a reader. Generally, the data reader identifies the type of data and extracts the data needed for further processing. Once read, the data is then split in a splitter to extract the audio component from the video component. The audio component is then output to an appropriate audio module which processes the audio component and eventually plays it in conjunction with the display of the panoramic video frame the audio component is associated with. The video component of the data is input into a decompressor module where it is decompressed. It is noted that in the event that the incoming video data does not contain an audio component or is not compressed, the aforementioned splitter and decompressor could be bypassed or eliminated, as desired.
Once decoded, the data associated with each video frame is preferably stored in a storage module. Specifically, the storage module will store the most recently received frame data and provide it to a 3D rendering module. When the next frame (or a desired part thereof) of the panoramic video is decoded by the decoder module, it replaces the previous frame data in the memory.
The 3D rendering module is essentially a texture mapper that takes the frame data and maps the desired views onto an environment model. This is accomplished using conventional computer graphics methods. It is noted that the environment model employed by the 3D rendering module can vary depending on how the incoming video file was created. If the incoming video data requires that a specific environment model be employed by the rendering module, this information might be embedded in the incoming video data, and provided to the rendering module via the decoder module or the storage module. Or, this information could be provided in a separate initialization file, along with other information. If a separate file is provided (again either via a network and server, or directly from a storage media) it can be input and stored in an initialization module, to which all the other modules have access to retrieve any necessary data.
An example of some additional information that might be provided and stored in the initialization module is the navigation limits associated with the environment model. The navigation limits would be provided to the 3D rendering module and used to in effect limit the regions of the environment that may be visited.
The output of the 3D rendering module is provided to a display module where the panoramic video is viewed by a user of the system. Typically, the user will be viewing just a portion of the scene depicted in the panoramic video at any one time, and will be able to control what portion is viewed. Preferably, the panoramic video viewer according to the present invention will allow the user to pan through the scene to the left, right, up or down. In addition, the user would preferably be able to zoom in or out within the portion of the scene being viewed. To this end a user interface module is provided that accepts user commands via some kind of input device, such as a keyboard, mouse or joystick, and provides the user""s viewing directives to the appropriate module. For example, the viewing direction and zoom directives would be provided to the 3D rendering module. In response, the rendering module would provide only those portions of the scene requested by the user to the display module. The user interface module could also be employed to accept other user commands, such as what video should be played, when to play or pause the chosen video, and allow the user to specify what temporal part of the video should be played (i.e., a seek-in-time feature). These types of commands could be forwarded to the decoding module for implementation. To this end the decoder module could include the capability of communicating with the server or storage media through an appropriate interface. For example, in the case where the panoramic video is being provided over a network, the decoder module would request the desired video from the responsible server and tell the server when to send the video data and when to stop (i.e., a pause). In addition, the decoder module could request that certain portions of the panoramic video be transmitted, rather than all the frames in sequence, thereby implementing the aforementioned seek-in-time feature.
As indicated previously, an audio module generally plays the portion of an audio component associated with the panoramic video frame that is currently being displayed by the viewer. However, it was also indicated previously that the user will be viewing just a portion of the scene depicted in each panoramic video frame. If the audio component of the incoming panoramic video is made up of audio data assigned to each panoramic frame, then the audio module simply plays audio derived from the audio data associated with the panoramic video frame from which the currently displayed portion of the scene was taken. However, if the audio component of the incoming panoramic video is made up of audio data assigned to prescribed portions of each panoramic video frame, then the audio module plays audio derived from the particular portion of the audio data assigned to the portion of the scene that is currently being viewed. If the portion of the scene being viewed is rendered from more than one of the aforementioned portions of a panoramic video frame, the audio data associated with each of these portions is blended (e.g., proportionally), and audio derived from the composited audio data is played.
It is also noted that since the viewer will typically only display a small portion of each panoramic video frame to the user at any one time, a large amount of unneeded data is being transferred in the case of a network connection. In addition, for locally stored video as well as for network connections, unneeded data must be processed. This problem can be solved by segmenting each of the panoramic video frames into regions and separately encoding the segments. In this way, it would only be necessary to decode the segments depicting the portion of the scene the user is currently viewing. Should the segmented panoramic video be available, it would be provided to the viewer in one of two preferred ways. In the first scenario, only those portions of each frame that pertain to the part of the scene the user is currently viewing would be sent to the decoder module. This can be accomplished by having the 3D rendering module provide information to the decoder module indicating what part of the scene is being viewed by the user. The decoder module would obtain only those segments of the full panoramic frame that pertain to the portion of the scene being viewed. Each segment associated with the part of each panoramic video frame that is to be shown to the user would be processed by the decoder module as described previously, and then stored in the storage module for transfer to the 3D rendering module. The rendering module would then employ conventional means to map the texture information contained in the segments onto the portion of the environment model of interest, and send the resulting image to the display module.
The other scenario would entail the data representing an entire panoramic frame being transferred to or read by the decoder module. However, only those portions representing the segments of the panoramic frame that are needed to render the desired view would be decoded and transferred to the storage module. As before, the 3D rendering module would provide information to the decoder module indicating what part of the scene is being viewed by the user. In addition, the data representing the desired portions of the panoramic video frame would be stored in the storage module for transfer to the 3D rendering module. The rendering module would employ conventional means to map the texture information contained in the files onto the portion of the environment model of interest, and send the resulting displayable image to the display module.