The following references are hereby incorporated by reference.
The ISO/IEC MPEG specification referred to as ISO/IEC 13818 is hereby incorporated by reference in its entirety.
1. Field of the Invention
The present invention relates generally to video-on-demand systems and video compression, and more particularly to a system and method for creating compressed fast forward and fast reverse video bitstreams from a normal play compressed video bitstream.
2. Description of the Related Art
Video-on-demand systems enable a plurality of users or viewers to selectively watch movies or other audio/video sequences which are stored on one or more video servers or media servers. The video servers are connected through data transfer channels, such as a broadcast network, to the plurality of users. For example, the video servers may be connected through a broadcast cable system or satellite broadcast system to a plurality of users or subscribers. The video servers store a plurality of movies or other audio/video sequences, and each user can select one or more movies from the video servers for viewing. Each user includes a television or other viewing device, as well as associated decoding logic, for selecting and viewing desired movies. When a user selects a movie, the selected movie is then transferred on one of the data transfer channels to the television of the respective user.
Full-motion digital video requires a large amount of storage and data transfer bandwidth. Thus, video-on-demand systems use various types of video compression algorithms to reduce the amount of necessary storage and data transfer bandwidth. In general, different video compression methods exist for still graphic images and for full-motion video. Video compression methods for still graphic images or single video frames are referred to as intraframe compression methods, and compression methods for motion video are referred to as interframe compression methods.
Examples of video data compression for still graphic images are RLE (run-length encoding) and JPEG (Joint Photographic Experts Group) compression. The RLE compression method operates by testing for duplicated pixels in a single line of the bit map and storing the number of consecutive duplicate pixels rather than the data for the pixel itself. JPEG compression is a group of related standards that provide either lossless (no image quality degradation) or lossy (imperceptible to severe degradation) compression types. Although JPEG compression was originally designed for the compression of still images rather than video, JPEG compression is used in some motion video applications.
In contrast to compression algorithms for still images, most video compression algorithms are designed to compress full motion video. Video compression algorithms for motion video use a concept referred to as interframe compression, which involves storing only the differences between successive frames in the data file. Interframe compression stores the entire image of a key frame or reference frame, generally in a moderately compressed format. Successive frames are compared with the key frame, and only the differences between the key frame and the successive frames are stored. Periodically, such as when new scenes are displayed, new key frames are stored, and subsequent comparisons begin from this new reference point. It is noted that the interframe compression ratio may be kept constant while varying the video quality. Alternatively, interframe compression ratios may be content-dependent, i.e., if the video clip being compressed includes many abrupt scene transitions from one image to another, the compression is less efficient. Examples of video compression which use an interframe compression technique are MPEG, DVI and Indeo, among others.
MPEG Background
A compression standard referred to as MPEG (Moving Pictures Experts Group) compression is a set of methods for compression and decompression of full motion video images which uses the interframe compression technique described above. MPEG compression uses both motion compensation and discrete cosine transform (DCT) processes and can yield compression ratios of more than 200:1.
The MPEG standard requires that sound be recorded simultaneously with the video data, and the video and audio data are interleaved in a single file to attempt to maintain the video and audio synchronized during playback. The audio data is typically compressed as well, and the MPEG standard specifies an audio compression method such as MPEG Layer II, also known by the Philips trade name of xe2x80x9cMUSICAMxe2x80x9d.
In most video sequences, the background remains relatively stable while action takes place in the foreground. The background may move, but large portions of successive frames in a video sequence are redundant. In generating an MPEG stream, an MPEG encoder creates I or Intra frames and P or Predicted frames, as well as B frames. The I frames contain the video data for the entire frame of video and are typically placed every 10 to 15 frames. The P frames only include changes relative to prior I or P frames. Both I and P frames are used as references for subsequent frames. In general, for the frame(s) following an I or P frame, i.e., frames that follow a reference frame, only small portions of these frames are different from the corresponding portions of the respective reference frame. Thus, for these frames, only the differences are captured, compressed and stored.
After the I frames have been created, the MPEG encoder divides each I frame into a grid of 16xc3x9716 pixel squares called macro blocks. The respective I frame is divided into macro blocks in order to perform motion compensation. Each of the subsequent pictures after the I frame are also divided into these same macro blocks. The encoder then searches for an exact, or near exact, match between the reference picture macro block and those in succeeding pictures. When a match is found, the encoder transmits a vector movement code or motion vector. The vector movement code or motion vector only includes information on the difference between the I frame and the respective succeeding picture. The blocks in succeeding pictures that have no change relative to the block in the reference picture or I frame are ignored. Thus the amount of data that is actually stored for these frames is significantly reduced.
After motion vectors have been generated, the encoder then tracks the changes using spatial redundancy. Thus, after finding the changes in location of the macro blocks, the MPEG algorithm further reduces the data by describing the difference between corresponding macro blocks. This is accomplished through a math process referred to as the discrete cosine transform or DCT. This process divides the macro block into four sub blocks, seeking out changes in color and brightness. Human perception is more sensitive to brightness changes than color changes. Thus the MPEG algorithm devotes more effort to reducing color space rather than brightness.
An MPEG stream includes three types of pictures, referred to as the Intra (I) frame, the Predicted (P) frame, and the Bi-directional Interpolated (B) frame. Intra frames provide entry points into the file for random access, and are generally only moderately compressed. Predicted frames are encoded with reference to a past frame, i.e., a prior Intra frame or Predicted frame. In general, Predicted frames receive a fairly high amount of compression and are used as references for future Predicted frames. Bi-directional pictures include the greatest amount of compression and require both a past and a future reference in order to be encoded. Bi-directional frames are never used for references for other frames.
Each picture or frame also includes a picture header which identifies the frame and includes information for that frame. The MPEG standard also includes sequence headers which identify the start of a video sequence. Sequence headers are only required once before the beginning of a video sequence. However, the MPEG-2 standard allows a sequence header to be transferred before any I frame or P frame. The sequence header includes information relevant to the video sequence, including the frame rate and picture size, among other information.
MPEG bitstreams used in digital television applications generally include a sequence header before every I frame and P frame. This is necessary to facilitate channel surfing between different video channels, which is an important user requirement. In general, when a user switches to a new channel, the video for the new channel can not be displayed until the next sequence header appears in the bitstream. This is because the sequence header includes important information about the video sequence which is required by the decoder before the sequence can be displayed. If a sequence header were not included before each I frame and/or P frame, then when the user switched to a new channel, the video for the new channel possibly could not be immediately displayed, i.e., the video could not be displayed until the next sequence header.
An MPEG encoded stream also includes weighting matrixes which are used for decoding the I frames in the MPEG bitstream. Each weighting matrix comprises a matrix of coefficients which are applied to different parameters of the Discrete Cosine Transform (DCT) used in encoding the frame. New weighting matrix values are included at the beginning of every video sequence, and these values are used for the respective frames until a subsequent new weighting matrix appears in the MPEG stream. The weighting matrices are typically included in sequence headers or picture headers. However, weighting matrices may also be inserted in P or B frames.
Trick Play Streams
In an interactive video-on-demand system, it is greatly desirable for the user to be able to selectively fast forward and/or fast reverse through the movie being watched. Thus, some video-on-demand systems include fast forward and fast reverse streams, referred to as trick play streams, for each movie. When the user desires to fast forward or fast reverse through a movie, the user selects the fast forward or fast reverse option. The respective fast forward or fast reverse trick play stream is then transferred to the user at the appropriate point where the user was watching, thus simulating a fast forward or fast reverse of the movie being watched.
Interactive video-on-demand systems which include trick play streams require methods for generating the trick play streams from a normal play bitstream. One current method for generating fast forward and fast reverse bitstreams from a normal play bitstream includes using a look-up table into multiple streams. The look-up table includes a plurality of indices which reference respective I frames, and the video server attempts to jump from index to index on the fly and play only the I-frame at each jump. In other words, the video server indexes into a look-up table to play only the I-frames for fast forward and fast reverse trick play streams. One problem with this method is that a considerable burden is placed on the video server in performing a table lookup and jumping from index to index on the fly while fast forward or fast reverse is being requested. Further, this method has associated bit rate expansion problems.
Another method that is known to produce trick play fast forward and fast reverse bitstreams is to generate a video stream which does not include the AC coefficients of the DCT, but rather only includes the DC coefficients. This produces a blocky trick play stream and is thus less desirable than other trick play stream generation methods.
Therefore, an improved system and method is desired for efficiently generating trick play video streams, i.e. fast forward and fast reverse video streams, from a compressed normal play bitstream.
The present invention comprises a system and method for generating trick play video streams, i.e., fast forward and fast reverse video streams, from a compressed normal play bitstream. The present invention efficiently generates compressed trick play video streams which require reduced storage and reduced data transfer bandwidth requirements. The present invention also does not require real time processing of video data, such as index look-ups.
The system first receives a compressed normal play bitstream, which is either stored on a local media or received from a remote location. The system then filters the bitstream by extracting and saving only portions of the bitstream. The system preferably extracts I-frames and sequence headers, including all weighting matrices, from the MPEG bitstream and stores this information in one or more new files. The filtering thus removes or deletes portions of the MPEG data stream, including predicted (P) frames and bi-directional (B) frames.
The system then assembles or collates the filtered data into a forward or reverse order to produce a single assembled bitstream. The system also ensures that the weighting matrixes properly correspond to the respective I-frames. For a fast forward trick play stream, the assembled bitstream comprises the sequence headers, I frames, and respective weighting matrices in the proper time or sequence order as they appeared in the original MPEG stream. For a fast reverse trick play bitstream stream, the system reverses the order of header/I frame groupings or tuples to produce a reverse play stream. This produces an assembled bitstream comprising a plurality of sequence headers and I-frames, including associated weighting matrices.
The assembled bitstream is then MPEG-2 decoded to produce a new video stream. The new video sequence comprises only one of every X pictures or frames of the original, uncompressed normal play bitstream, wherein 1/X is the frequency of I frames in the original, compressed normal play stream. This output picture stream is then re-encoded with MPEG parameters desired for the trickplay stream, thus producing a trickplay stream that is a valid MPEG encoded stream. When this new MPEG encoded trickplay stream is decoded, the result is a fast forward or fast reverse video sequence which includes only one of every X frames of the original, uncompressed normal play bitstream.
Therefore, the present invention more efficiently generates trick play streams from a compressed normal play bitstream. The resulting trick play stream is a valid MPEG encoded stream and thus has reduced storage and data transfer data bandwidth requirements, and this trick play stream can be decoded with known behavior on any MPEG decoder.