1. Field of the Invention
The present invention relates generally to video delivery and video-on-demand systems, and more particularly to a video server system and method for indexing between video streams having different presentation rates, i.e., normal play, fast forward and fast reverse video streams.
2. Description of the Related Art
Video-on-demand or video delivery systems enable a plurality of users or viewers to selectively watch movies or other audio/video sequences which are stored on one or more video servers or media servers. The video servers are connected through data transfer channels, such as a broadcast cable system or satellite broadcast system, to the plurality of users or subscribers. The video servers store a plurality of movies or other audio/video sequences, and each user can select one or more movies from the video servers for viewing. Each user includes a television or other viewing device, as well as associated decoding logic, for selecting and viewing desired movies. When a user selects a movie, the selected movie is transferred on one of the data transfer channels to the television of the respective user.
Full-motion digital video requires a large amount of storage and data transfer bandwidth. Thus, video-on-demand systems use various types of video compression algorithms to reduce the amount of necessary storage and data transfer bandwidth. In general, different video compression methods exist for still graphic images and for full-motion video. Video compression methods for still graphic images or single video frames are referred to as intraframe compression methods, and compression methods for motion video are referred to as interframe compression methods.
Examples of video data compression for still graphic images are RLE (Run-Length Encoding) and JPEG (Joint Photographic Experts Group) compression. The RLE compression method operates by testing for duplicated pixels in a single line of the bit map and storing the number of consecutive duplicate pixels rather than the data for the pixel itself. JPEG compression is a group of related standards that provide either lossless (no image quality degradation) or lossy (imperceptible to severe degradation) compression types. Although JPEG compression was originally designed for the compression of still images rather than video, JPEG compression is used in some motion video applications.
In contrast to compression algorithms for still images, most video compression algorithms are designed to compress full motion video. Video compression algorithms for motion video use a concept referred to as interframe compression, which involves storing only the differences between successive frames in the data file. Interframe compression stores the entire image of a key frame or reference frame, generally in a moderately compressed format. Successive frames are compared with the key frame, and only the differences between the key frame and the successive frames are stored. Periodically, such as when new scenes are displayed, new key frames are stored, and subsequent comparisons begin from this new reference point. It is noted that the interframe compression ratio may be kept constant while varying the video quality. Alternatively, interframe compression ratios may be content-dependent, i.e., if the video clip being compressed includes many abrupt scene transitions from one image to another, the compression is less efficient. Examples of video compression which use an interframe compression technique are MPEG, DVI and Indeo, among others.
MPEG Background
A compression standard referred to as MPEG (Moving Pictures Experts Group) compression is a set of methods for compression and decompression of full motion video images which uses the interframe compression technique described above. MPEG compression uses both motion compensation and discrete cosine transform (DCT) processes and can yield compression ratios of more than 200:1.
The MPEG standard requires that sound be recorded simultaneously with the video data, and the video and audio data are interleaved in a single file to attempt to maintain the video and audio synchronized during playback. The audio data is typically compressed as well, and the MPEG standard specifies an audio compression method such as MPEG Layer II, also known by the Philips trade name of "MUSICAM".
An MPEG stream includes three types of pictures, referred to as the Intra (I) frame, the Predicted (P) frame, and the Bi-directional Interpolated (B) frame. The I or Intra frames contain the video data for the entire frame of video and are typically placed every 10 to 15 frames. Intra frames provide entry points into the file for random access, and are generally only moderately compressed. Predicted frames are encoded with reference to a past frame, i.e., a prior Intra frame or Predicted frame. Thus P frames only include changes relative to prior I or P frames. In general, Predicted frames receive a fairly high amount of compression and are used as references for future Predicted frames. Thus, both I and P frames are used as references for subsequent frames. Bidirectional pictures include the greatest amount of compression and require both a past and a future reference in order to be encoded. Bi-directional frames are not used for references for other frames.
After the I frames have been created, the MPEG encoder divides each I frame into a grid of a suitable size, e.g., 16.times.16 pixel squares, called macro blocks. The respective I frame is divided into macro blocks in order to perform motion compensation. Each of the subsequent pictures after the I frame are also divided into these same macro blocks. The encoder then searches for an exact, or near exact, match between the reference picture macro block and those in succeeding pictures. When a match is found, the encoder transmits a vector movement code or motion vector. The vector movement code or motion vector only includes information on the difference between the reference frame and the respective succeeding picture. The blocks in succeeding pictures that have no change relative to the block in the reference picture or frame are ignored. In general, for the frame(s) following a reference frame, i.e., P and B frames that follow a reference I or P frame, only small portions of these frames are different from the corresponding portions of the respective reference frame. Thus, for these frames, only the differences are captured, compressed and stored. Thus the amount of data that is actually stored for these frames is significantly reduced.
After motion vectors have been generated, the encoder then tracks the changes using spatial redundancy. Thus, after finding the changes in location of the macro blocks, the MPEG algorithm further reduces the data by describing the difference between corresponding macro blocks. This is accomplished through a math process referred to as the discrete cosine transform or DCT. This process divides the macro block into a suitable number of sub blocks, e.g., four sub blocks, seeking out changes in color and brightness. Human perception is more sensitive to brightness changes than color changes. Thus the MPEG algorithm devotes more effort to reducing color space rather than brightness.
Each picture or frame also includes a picture header which identifies the frame and includes information for that frame. The MPEG standard also includes sequence headers which identify the start of a video sequence. Sequence headers are only required once before the beginning of a video sequence. However, the MPEG-2 standard allows a sequence header to be transferred before any I frame or P frame. The sequence header includes information relevant to the video sequence, including the frame rate and picture size, among other information. MPEG video streams used in digital television applications generally include a sequence header before every I frame and P frame. This is necessary to facilitate channel surfing between different video channels, which is an important user requirement. In general, when a user switches to a new channel, the video for the new channel cannot be displayed until the next sequence header appears in the stream. This is because the sequence header includes important information about the video sequence which is required by the decoder before the sequence can be displayed. If a sequence header were not included before each I frame and/or P frame, then when the user switched to a new channel, the video for the new channel possibly could not be immediately displayed, i.e., the video could not be displayed until the next sequence header.
The sequence headers in an MPEG encoded stream include presentation timestamps or a time base within the encoded stream. Timestamps provide a user with a time reference relative to the beginning of a movie, enabling the user to accurately select or identify a sequence located midstream of the movie without having to reference the beginning of the movie.
Trick Play Streams
In an interactive video-on-demand (VOD) or near-video-on-demand (NVOD) system, it is greatly desirable for the user to be able to selectively fast forward and/or fast reverse through the movie being watched. Thus, some video-on-demand systems include fast forward and fast reverse streams, referred to as trick play streams, for each movie. When the user desires to fast forward or fast reverse through a movie, the user selects the fast forward or fast reverse option. The respective fast forward or fast reverse trick play stream is then transferred to the user at the appropriate point where the user was watching, instead of the normal play stream, thus simulating a fast forward or fast reverse of the movie being watched. Typically, a single video stream, such as a movie, is encoded at different presentation rates to enable the video file to operate in fast forward or fast reverse speed in addition to the normal play presentation rate.
Indexing
Interactive video-on-demand systems which include trick play streams require methods for indexing between the normal play stream and the trick play streams, as well as for indexing between the trick play streams. In other words, when a user is watching a movie and chooses to fast forward for a period of time, a mechanism is needed for the video server to switch from the normal play stream to the appropriate point or frame in the fast forward stream. When the user then desires to resume watching at normal play speed, a mechanism is also needed for the video server to switch from the frame being viewed in the fast forward stream to the appropriate point or frame in the normal play stream. Thus the video server must be able to determine the proper positions within video files when a switch occurs in outputting a first video file at a first presentation rate to a second video file at a second presentation rate.
One approach for indexing between normal play and trick play streams includes using lookup tables to index between the various streams. The look-up tables each include a plurality of indices which reference respective positions or I frames in the various streams. For example, index look-up tables can be generated using the MPEG presentation timestamps from the sequence headers of the normal play stream.
One drawback to this approach is that the MPEG presentation timestamps may not always be continuous. For example, there is no requirement that the MPEG presentation timestamps be continuous, e.g., there could be breaks or gaps in the presentation timestamps.
Another problem is that presentation timestamps are presentation-based. Thus, when a fast forward stream which is 5.times. fast is being played, the presentation timestamps do not advance 5.times. faster, but advance at the same rate as they do in a normal play stream. Thus in this method the server is required to perform computations on the presentation timestamps to determine the corresponding place in another stream. This increases the real-time processing burdens on the media server.
This approach also requires each decoder to have intelligence, and further requires the media server to interact with the decoder to accomplish stream switches. For example, when the user selects the fast forward or fast reverse option, in this method the decoder is required to provide information back to the media server of the respective presentation timestamp where the decoder stopped playing, as well as the presentation rate of the stream being played. The media server then uses this information to determine the appropriate presentation timestamp location to begin playing in the new stream. This requirement that the decoder interact with the media server to accomplish stream switches, as well as the computations required to be performed by the media server, increases the overhead of the system. The interaction between the media server and the decoder also requires that each decoder have intelligence, which increases the cost of each decoder.
One such approach based on MPEG presentation timestamps is HP's "PictureNumber, PresentationTimeStamp, FileOffset" format for each table entry. Unfortunately, not all encoding formats are MPEG-based. Further, accurate mapping between presentation rates can be accomplished only if the underlying assumption that the presentation rate is a constant ratio, i.e. one assumes the encoded video stream has a uniform frame rate, is true. Conversely, a uniform frame rate at all presentation rates disables techniques such as "scene fast forward".
Therefore, an improved system and method is desired for efficiently indexing between normal play streams and trick play video streams in a video delivery system. An improved system and method is further desired which reduces the processing burdens of the media server.