1. Field of the Invention
The present invention relates to the field of digital motion video and more particularly to a system and techniques for altering and decompressing digital motion video signals in a manner which allows efficient reverse play of the motion video as well as efficient, frame-level access and play of the motion video stream for creation of other special video effects. The system and techniques are compatible with the MPEG-1 standard adopted by the International Standards Organization's (ISO's) Moving Picture Experts Group (MPEG), however the invention taught herein may also be applied to other video coding algorithms which share some of the features of the MPEG algorithm, such as Intel Corporation's Indeo.TM. and Indeo Video Interactive algorithms, the Fractal Codec algorithm from Iterated Systems of Atlanta, Ga., MVI from Sirius Publishing of Scottsdale, Ariz., Cinepak from Radius of Sunnyvale, Calif. and the Smacker 2.0 algorithm from RAD Software of Salt Lake City, Utah.
2. Environment
The present invention relates generally to the field of digital video and more specifically to the coding and compression of analog video signals into digital video and the decoding and decompression of the digital bitstream into a displayable video signal. Digital video compression is used in a variety of applications where video images are displayed in a system where available bandwidth is limited, such as video telephone, digital television and interactive multimedia using such digital storage technology as CD-ROM, digital audio tape and magnetic disk. Such applications require digital video coding, or video compression to achieve the necessary high data transfer rates over relatively low bandwidth channels.
Various standards have been proposed and are in use for video coding. The standards vary from application to application in resolution and frames per minute allowed based, among other things, on the bandwidth available in the particular application. Several of these standards involve algorithms based on a common core of compression techniques, including transform coding, such as that employing the Discrete Cosine Transform. See K. R. Rao and P. Yip, DISCRETE COSINE TRANSFORM, ALGORITHMS, ADVANTAGES, APPLICATIONS, San Diego, Calif., Academic Press, 1990, and H. Ahmed, T. Ratarajan, and K. R. Rao, Discrete Cosine Transform, IEEE TRANSACTIONS ON COMPUTERS, pp. 90-93, January 1974. See also U.S. Pat. No. 4,791,598 entitled "Two-Dimensional Discrete Cosine Transform Processor," issued Dec. 13, 1988.
This invention relates most specifically to those digital video applications where the user interacts with the system in ways which can modify the video display, such as in interactive computer games or other interactive multimedia applications. In particular, digital video systems, such as MPEG video players in personal computers or video game machines would benefit from use of the apparatus and methods of the present invention to allow more efficient and realistic navigation through a video world, creation of special effects, frame specific search and access to a video stream and reverse playback of a video stream.
The MPEG-1 Video Compression Algorithm
The ISO's MPEG-1 algorithm is designed to yield a true TV-like image with compression ratios around 180:1 at data rates low enough for use in storage applications with data transfer rates at or below 1.5 Mb/s (megabits/sec), comparable to those used on CD-ROM drives on personal computers. While the algorithm is designed for such data rates, it is usable at higher data rates. The inventor routinely uses data rates of 2 to 2.5 Mb/s. MPEG-1 is designed to work with images having a one-fourth of broadcast-quality resolution: 352 by 240 pels. This is approximately the quality of a picture presented by standard VHS video cassettes.
An MPEG-1 stream may consist of 0 to 16 separate video streams, 0 to 32 separate audio streams, any of which may be in stereo, and possibly other customized streams carrying user-specified information and padding bytes. The various streams are multiplexed into a single MPEG composite stream called a "system stream." This invention relates to the manipulation of an MPEG-1 video stream. It also relates to ways of de-multiplexing the system stream to create an actual or virtual non-multiplexed, valid MPEG stream.
The further aspects of the MPEG video standard, including the other data streams which comprise the MPEG system stream, are well known in the art, are extensively discussed in the literature, including International Standard ISO/IEC 11172-2, entitled "Information technology--Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s--Part 2: Video" dated Aug. 1, 1993 and will not be further discussed here. Similarly, the application of the present invention to other systems of video data compression does not require discussion of analogous aspects of those systems.
A single video stream consists of a sequence of pictures. These pictures are also referred to as "frames." The MPEG video stream is normally created by subjecting video data representing a video picture or frame to several digital compression steps. The MPEG-1 encoding scheme includes intra-frame compression which seeks to reduce redundancies within a frame and inter-frame compression which uses motion compensation to identify and eliminate redundancies between sequential frames. Motion compensation takes advantage of the movement of picture elements that remain approximately the same within a series of sequential frames but change position from frame to frame.
In MPEG, motion compensation is accomplished by employing a sequence of types of frames with various characteristics within a related group of pictures. The three types of frames possible in normal MPEG video are I-frames (Intra), P-frames (Predictive), and B-frames (Bidirectional). A fourth type of frame, the D-frame, is defined in the standard but is intended for use only as an indexing and overview feature and cannot be mixed with I, B, and P frames. I and P frames are collectively called reference frames since other frames can be based on them. I frames contain all of the information needed to reconstruct one frame of video. P-frames can use information from the previously displayed reference frame and can add new information. B frames can use information from either the previously displayed reference frame, the next reference frame that will be displayed, or both, and can also add new information.
Since B frames can depend on a frame that will be displayed at some point in the future, the pictures in the MPEG bitstream are encoded and stored in a different order than they will be displayed. The order that the pictures are intended to appear on the screen is referred to as the "display order," and the order that the pictures appear in the bitstream is referred to as the "bitstream order." Bitstream order is optimized to provide the necessary reference pictures at the appropriate time to allow efficient parsing and decoding when the stream is played forward, removing the need to back up or skip forward to display the stream. Because MPEG is optimized for forward play, backward play is especially challenging. Further, before the current invention, efficient backward play, that is backward play of acceptable speed and quality, was not obtainable on a machine with only the memory resources required for acceptable forward play.
The MPEG standard also defines the concept of a Group of Pictures (GOP). Each GOP contains at least one I-frame and may contain additional I, B, and P frames. There is no limit on the size of a GOP. The GOP may begin in display order with one or more B-frames that refer to the last reference frame in the previous GOP, but no GOP may end with a frame that refers to the next GOP in the display order. Each GOP begins with a header which contains parameters to assist in decoding the video stream. While such parameters can be different for every GOP in the stream, they typically are the same for all GOPs in a stream.
Finally, the MPEG standard defines a "sequence." A sequence is a sequential group of GOPs in an MPEG stream. Each sequence begins with a sequence header which contains parameters which may be used to assist in decoding the sequence.
The video data is broken down into a luminance or Y component and two color difference components, Cr (red chrominance) and Cb (blue chrominance). The individual pictures can be represented as arrays of Y, Cr and Cb values. The Cr and Cb values are subsampled with respect to the Y values by 2:1 in both the horizontal and vertical directions, therefore there is one Cr and one Cb value for each four Y values.
Pictures are broken down into macroblocks, which are contiguous regions of 16.times.16 pels. The Y component is represented by four 8.times.8 contiguous blocks for each 16.times.16 macroblock. The Cr and Cb components are represented by a single 8.times.8 block for each component, but due to the subsampling discussed above the Cb 8.times.8 block and the Cr 8.times.8 block each cover the same area of the screen as the four 8.times.8 Y blocks. The macroblock, therefore, consists of six 8.times.8 blocks, each limited to one component and all superimposed in the 16.times.16 pel area of the display covered by the four 8.times.8 Y blocks.
In MPEG-1 coding the six blocks comprising the macroblock are each subjected to a Discrete Cosine Transformation (DCT) algorithm that transforms them losslessly into 8.times.8 matrices that represent on each axis increasing horizontal and vertical frequency. Further compression steps take place to reduce the range of the values and encode them using a Huffinan-type compression algorithm, but the specific algorithms used in the further compression steps are not relevant to the present invention.
All macroblocks in an I picture are intra-coded. This means that all their DCT coefficients are encoded directly into the bitstream with no references to other pictures.
Each macroblock in a P picture may or may not have their DCT coefficients directly coded (herein referred to as "intra-coded" information), and each may or may not have a reference to a 16.times.16 pel area in the most recently displayed reference frame (such references are referred to as "motion vectors" in the ISO/IEC standard 11172-2 or as "inter-" coded information). If both motion vector information and intra-coded information are present, the values are added. Either intra-coded information or an inter-coded reference to the next reference frame must be supplied for each macroblock.
Each macroblock in a B picture may or may not have intra-coded information, may or may not have a reference to a 16.times.16 pel area in the most recently displayed reference frame, and may or may not have a reference to a 16.times.16 pel area in the next reference frame that will be displayed. If references to both the previous and next reference frames are present, the values in the two frames are averaged and added to the intra-coded information, if any. Either intra-coded information or an inter-coded reference, either to the next reference frame, the previous reference frame or both the next and previous reference frames, must be available for each macroblock, although the standard allows the information to be inherited from previous macroblocks in some cases.
As the stream is parsed and decoded, the MPEG player constantly keeps the last two reference frames available for use in decoding B and P frames when they appear. The first reference frame decoded is placed in the future buffer. When a new reference frame is encountered in the decoder's parsing of the bitstream (bitstream order), the previous "future" frame becomes the "past" frame and is normally displayed at that time. The new reference frame is read into the future buffer and becomes the future frame. These available reference frames are known as the "past" and "future" frames or pictures and are normally kept in portions of the computer or decoder memory known respectively as the "past" and "future" buffers. As mentioned above, P frames may refer to past reference frames, and B frames may refer to past and/or future reference frames. The appropriate reference frames must be in the appropriate buffers of the MPEG player for the P and B frames to be properly decoded. The MPEG bitstream is designed so that the proper frames will always be in the appropriate buffers when dependent frames are presented for decoding. If the past and future buffers contain the correct values and the MPEG decoder decodes the B picture, the correct picture will be displayed on the screen. However, the contents of these buffers change frequently during normal play. This makes it difficult to play a dependent frame except in the original linear video order.
As used in this disclosure, the terms "parsed" and "decoded" are virtually synonymous. They both refer to the various processes employed by the computer and MPEG player whereby the digital information contained in the compressed video stream is accessed, manipulated, converted into bitmaps and displayed in the proper order. However as explained herein, a compressed, digitized video stream may be partially or completely parsed or decoded. Thus, parsing or decoding may refer to only one or less than all steps necessary for complete decoding of the stream information. Similarly, as is here made evident, a GOP, picture, or portion of a picture may be completely or incompletely parsed for reasons other than display. Depending on the context in which they are used, the terms "parse" or "decode" may refer only to preliminary steps in the decoding process, such as those steps necessary to determine whether a certain picture is an I, P or B frame, or may refer to the entire process of decoding the picture and displaying the resultant bitmap.
As used in this disclosure, the terms "frame" and "picture" are also virtually synonymous. They both refer to a single video picture, whether or not it is coded.
Frame accurate access to the video stream is not necessary for broadcast, satellite or cable video programming applications. However, it is desirable for many other uses of MPEG, particularly in interactive, multimedia computer applications such as computer games. It would be desirable to use MPEG video "worlds" in interactive educational and game programs. It would further be desirable to have frame accurate access to the MPEG video streams comprising such video worlds, subplots, and the like.
Although there are suggestions in ISO/IEC 11172-2 regarding random access, reverse play and other special effects, no adequate methodology has been provided for achieving random access at frames other than I frames or for achieving reverse play of MPEG video with computer memory resources no greater than those required for forward play.
The MPEG standard has been designed primarily to support normal, forward linear playback of a digital video stream in display order. However, the standard also refers to possible additional operations including random access, fast search, reverse playback, error recovery, and editing. The MPEG standard also mentions the possibility of reverse playback. Reverse playback poses particular problems because of the directionality enforced by the MPEG standard in encoding groups of pictures. Only I frames can be individually accessed and decoded. Neither B nor P pictures contain sufficient information to generate a complete frame without reference to previous (bitstream order) pictures. As with other digital streams, an MPEG stream has directionality and is incomprehensible if read backwards bit by bit. Further, the bitstream order of an MPEG stream has a definite directionality on the Group of Pictures level as well. Consequently, only reverse play of I frames can be achieved by simply reading the frames into the decoder in either reverse display order or in reverse bitstream order.
The MPEG standard suggests performing reverse playback by decoding GOP's in the ordinary fashion, storing the decoded bitmaps in a memory buffer and then displaying the bitmaps in reverse order. While this method results in a reverse playback with equal quality to the forward playback, by requiring storing of decoded pictures before playback, it places significant greater demands on computing resources, particularly memory resources than does forward MPEG. Another method is to decode only the I frames in each group of pictures. While this method eliminates the bit map buffer requirement, it results in either loss of temporal resolution (where there is a significant number of B and P pictures skipped) or loss of compression (where the original video sequence is coded primarily in I frames to allow for smoother reverse playback).
Another method of creating a similar effect would be to avoid reverse play of an MPEG stream by storing "forward" and "backward" video contents in standard unidirectional MPEG streams to simulate reverse play by having the run-time system switch from the appropriate forward stream to the corresponding reverse stream when the "reverse" command is given by the user. While eliminating the need for memory resources which the bit map storage method requires and eliminating the loss of temporal resolution which the I frame only method may involve, such a system would double the storage requirements for the video information files which are to be made available for forward and reverse play. Further, such a solution would require limitation of the points along the video stream where a reverse command could be executed and/or complete synchronization of the forward and reverse MPEG streams. Such a solution would also require a seek to the reverse stream every time a reverse command is given, slowing down navigation of the video world.
While reverse play is not necessary for broadcast, satellite, or cable video programming applications, it is desirable for many other uses of MPEG, particularly in interactive, multimedia computer applications, such as computer games. It would be desirable to use MPEG video "worlds" in interactive educational and game programs. It would further be desirable to have such worlds navigable in the forward and reverse directions without doubling the MPEG storage requirements for creation of such a world.
Further, there has been no method suggested for creating meaningful transitions between separate MPEG video streams, or solving the problem of delay in the display of information during the seek time required to transition from one video stream to another. As all of the methods of MPEG frame specific access, reverse play and stream to stream transitions attempted to date have limitations which makes their use in interactive multimedia personal computer applications limited, there is needed more efficient methods to accomplish these functions.