The present invention relates to compression of image sequences. More particularly, this disclosure provides an improved reverse play system that compresses an output image signal by re-using motion vectors from a compressed image input signal.
Digital video formats such as those used for high definition television (HDTV) will be much more widely used in the next decade, due to improvements in compression technology and also to laws which mandate HDTV broadcast. These digital video formats typically include more picture information than conventional video (e.g., xe2x80x9cNTSCxe2x80x9d the present television standard in the United States, as well as xe2x80x9cSECAM,xe2x80x9d xe2x80x9cPALxe2x80x9d and other conventional standards), and bandwidth constraints will likely require storage and transmission in compressed format. This operation conflicts with operation of present day televisions (TVs), video cassette recorders (VCRs) and similar equipment which do not receive and transmit signals in compressed format.
Thus, the new standards probably require compatible video players which also, as with present day equipment, provide search, fast forward, reverse play, and similar functions. However, if the newer digital equipment (e.g., HDTVs) only accept compressed signals, then video players will need to output compressed signals that already incorporate these services, wherein arises a difficulty; performing these functions will likely require a video player to completely de-compress a stored video signal, re-order frames to affect a reverse order, and then compress the re-ordered frames, all in real-time. This processing is computationally very expensive because compression requires a substantial amount of processing resources.
1. Processing Resources Required for Typical Compression
To understand why substantial processing resources are often required, it will be helpful to first discuss compression techniques which rely upon block-based encoding. There are many compression standards which use block-based encoding, but for purposes of explanation, the discussion below focusses on a standard proposed by the Motion Picture Experts Group, called xe2x80x9cMPEG-2;xe2x80x9d MPEG-2 encodes data in blocks of identically-sized, square xe2x80x9ctiles,xe2x80x9d and is explained with respect to FIGS. 1-2.
FIG. 1 shows two image frames, including an earlier frame 11 and a later frame 13. In accordance with typical MPEG-2 compression protocol, the later frame 13 is completely divided into a number of square tiles 15, although only four such tiles are illustrated in FIG. 1 for purposes of discussion. These tiles each represent a group of pixels, for example, sixteen pixels across by sixteen pixels down, or eight pixels across by eight pixels down. The xe2x80x9cpixelxe2x80x9d is the smallest unit of image data, and can have a unique color and brightness. To reduce the amount of data that is required to reproduce the later frame, each tile 15 is compressed to simply be information on how that tile can be reproduced from image data elsewhere (i.e., by copying and modifying another part of either the same image frame or another image frame). Otherwise stated, the compression process results in information indicating where other similar data may be found and how that similar data must be modified in order to recreate the tile of interest. In this example, it will be assumed that each tile 15 illustrated in the later frame will be reproduced from the earlier frame. Notably, decoders usually decode one frame at a time from compressed format to the spatial domain, so that when it comes time to decode the later frame, the earlier frame will have already been decoded.
Each frame is tiled in a similar manner, as indicated by corresponding square tiles 17 of the earlier frame. These corresponding tiles 17 might not be the xe2x80x9cclosest matchxe2x80x9d with which to recreate the later frame and, for purposes of discussion, it will be assumed that the closest matches are respectively located as identified by reference squares 19.
Therefore, with reference to FIG. 2, each tile of the later frame is recreated using both motion vectors 21 and associated sets of residuals. Four motion vectors 21 are indicated in FIG. 2, each identifying an offset for finding the closest match in the prior frame. The motion vectors 21 correspond to the difference in positions (illustrated via FIG. 1) between each corresponding tile 15 and their closest matches 19. The residuals are simply raw differences in pixel intensity and color (there are usually two sets of residuals per motion vector) which are added to the closest match in order to exactly recreate each of the tiles 15.
Search for the closest match is computationally very expensive, because one must compare each tile 15 with many different identically sized groups of data from the earlier frame 11; each tile is typically compared with every possible subset of data falling within an earlier frame-search window that is four times the tile size or larger. Usually, the result of each comparison is stored, and the closest match is determined by choosing the subset of data that yields the fewest differences. The amount of processing often required for such xe2x80x9cmotion searchxe2x80x9d can be observed by noting that (a) because a search window is often four times tile size, either 256 or 64 different comparisons are performed for each tile to determine its closest match, (b) each pixel in each tile usually has at least 8 bits of brightness information and 8 bits of color information, which are often all used in each comparison (e.g., each and every comparison can involve many thousands of bits), and (c) in a typical digital image signal there may be several thousand tiles that are compressed using motion search. Motion search often requires over seventy percent of processing resources used to compress the video.
2. Difficulties in Reversing Play in a Compressed Signal
As indicated, reverse play conventionally is performed by completely decompressing image frames to the spatial domain, re-ordering those frames, and then compressing those frames, which includes performing motion search as has just been described for each tile of the newly ordered frames. In the example indicated by FIGS. 1-2, backward play would be achieved by placing the later frame first, the earlier frame second, and again compressing the two frames (except that this time with motion vectors describing how to produce the earlier frame from the later frame instead of vice-versa).
Conventionally, decoding frames to the spatial domain is needed because block-based compression is typically a one-way function. That is to say, a later frame can be reproduced from an earlier frame upon which it depends, but the reverse is often not true, for reasons explained with reference to FIG. 3.
As represented by FIG. 3, in reverse play, the earlier frame 11 now follows the later frame 13 in order, and the earlier frame must now be reproduced from the later frame. The original encoding of motion vectors and sets of associated residuals, however, does not necessarily reflect all data in the earlier frame 11. This can be seen by noting that a significant amount of image space for the earlier frame (represented by the shaded region 23 of FIG. 3) may have no closest match in the later frame. Since it is desired that this information also be reproduced from a closest match, conventional backward play calls for full motion search as was described above, in the reverse direction. In other words, the earlier frame would be divided into tiles (as indicated by reference blocks 17 in FIG. 1) and these tiles would then typically each be compared to subsets of data in a search window in the later frame 13 to determine the closest matches.
Unfortunately, as mentioned, full motion search can take enormous processing resources; if a signal is to be reversed and output in compressed format in real-time, e.g. reverse play by a VCR or video disk player, then full motion search may have to be performed across tens of millions of bits per second, a significant processing task even for today""s very fast computers and film editing machines.
What is needed is a compression system that is less taxing on computational resources. Still further, a need exists for a system that does not require full motion search to provide reverse play capability. Such a system would have ready applicability to VCRs, Internet servers and computers, compact disk and other video players, and the like, especially with those systems which are adapted to handle HDTV. The present invention satisfies these needs and provides further, related advantages.
The present invention solves the aforementioned needs by providing a compression system that estimates motion vectors in compression. By using earlier motion vectors from the input signal to compute new motion vectors which point in the reverse direction, the present invention eliminates a potentially significant portion of the searching that would be required. The present invention thus significantly assists real-time processing by video players, such as VCRs, video disk players, Internet servers and the like, that can provide reverse play and other functions while being compatible with the new digital video standards.
One form of the present invention provides a system that receives a compressed input including a later frame which depends on an earlier frame. These frames are to be reversed in dependency by de-compressing both frames and then re-compressing the earlier frame to now depend upon the later frame (the order in which the frames are actually output may also be changed). As would be conventional, the frames are first converted to the spatial domain for re-ordering or other processing.
According to the present invention, however, at least one motion vector is extracted from the compressed image input. This motion vector describes a position where a closest match may be found in an earlier frame, with which to reconstruct part of a later frame. According to the method, this motion vector is used to derive reverse-direction data; in the preferred embodiment, for example, the reverse direction data can be a location of closest match data in the earlier frame.
During re-compression, the earlier frame is divided into data blocks, for example, 8xc3x978 or 16xc3x9716 pixel tiles as described above, or variable size data blocks such as in a MPEG-4 format. The location of the closest match is compared with a local neighborhood defined by a data block position to determine overlap. If there is overlap, the extracted motion vector is inverted and used to calculate a new motion vector.
Articulated somewhat differently, the present invention extracts a motion vector from the compressed input and determines whether that motion vector may be inverted and xe2x80x9cre-used.xe2x80x9d This determination is based on whether the tile being compressed is similar to a closest match reflected by the input signal, i.e., if the two are very similar, then extensive motion search does not have to be performed, because the inverted motion vector already provides information that may be used to find a xe2x80x9cclosest match.xe2x80x9d
In more particular features of the invention, all motion vectors from the compressed input are inverted and combined with the location of their data blocks or tiles to yield closest match positions. These positions can be stored in memory and when it comes time to compress data in the reverse direction, as each data block or tile is processed, compression software polls the table to determine overlap between any closest match location and the neighborhood for the data block or tile presently being compressed. If overlap exists, then the amount of motion search is reduced substantially. Preferably, if there is overlap between multiple closest matches and the data block or tile in question, then the closest match having the largest overlap is determined, and only its motion vector is used to derive a new motion vector. The preferred embodiment can then proceed directly to calculate residuals, without any motion search. Alternatively, the system can resolve multiple overlaps by looking at residual energy, and selecting a motion vector corresponding to lowest residual energy.
The invention may be better understood by referring to the following detailed description, which should be read in conjunction with the accompanying drawings. The detailed description of a particular preferred embodiment, set out below to enable one to build and use one particular implementation of the invention, is not intended to limit the enumerated claims, but to serve as a particular example thereof.