As with many of today's technologies, the current trend in image sequence developing and editing is to use digital formats. Even with motion picture film, editing of image sequences (including image splicing, color processing, and special effects) can be much more precisely accomplished by first converting images to a digital format, and performing desired edits upon the digital format. If desired, images can then be converted back to the original format.
Unfortunately, digital formats usually require enormous amounts of memory and transmission bandwidth. A single image with a resolution of 200.times.300 pixels can occupy megabytes of memory. When it is considered that many applications (for example, motion picture film processing) require far greater resolution, and that image sequences can include hundreds or thousands of images, it becomes very apparent that many applications are called upon to handle gigabytes of information, creating a bandwidth problem, in terms of computational and transmission resources.
To solve the bandwidth problem, standards have been proposed for image compression. These standards generally rely upon spatial or temporal redundancies which exist in one or more images.
A single image, for example, may have spatial redundancies in the form of regions having the same color (intensity and hue); a single, all blue image could potentially be represented simply by its intensity and hue, and information indicating that the entire frame has the same characteristics.
Temporal redundancies typically exist in sequences of images, and compression usually exploits these redundancies as well. For example, adjacent images in a sequence can be very much alike; exploiting redundancies, a compressed image sequence may include data on how to reconstruct current image frames based upon previously decoded frames. This data can be expressed as a series of vectors and difference information. To obtain this information, pixels in the second frame are grouped into images squares of 8.times.8 or 16.times.16 pixels ("blocks" of pixels), and a search is made in a similar location in a prior frame for the closest match. The vectors and difference information direct a decoder to reconstruct each image block of the second frame by going back to the first frame, taking a close match of data (identified by the vector) and making some adjustments (identified by the difference information), to completely reconstruct the second frame.
One group of standards currently popular for compression of image sequences has been defined by the Moving Picture Experts'Group, and these standards are generally referred to as "MPEG." The MPEG standards generally call for compression of individual images into three different types of compressed image frames: compressed independent ("I") frames exploit only spatial redundancies, and contain all the information necessary to reconstruct a single frame; compressed prediction ("P") frames exploit temporal redundancies from a prior frame (either a prior P or I frame) and typically only require only about 1/3 as much data as an I frame for complete frame reconstruction; and, compressed bidirectional ("B") frames can use data from either or both of prior and future frames (P or I frames) to provide frame reconstruction, and may only require 1/4 as much data as a P frame. Other compression standards also rely upon exploitation of temporal image redundancies, for example, H.261 and H.263.
While very useful in reducing the amount of data necessary to completely reconstruct an image sequence, image compression techniques are typically not compatible with editors which manipulate image sequences. For example, it may be desired to play a compressed image sequence in reverse; because a compressed image sequence typically includes prediction frames, which are predicted based on previously decoded frames, playing such a sequence backward is extremely difficult unless the entire sequence is first decoded, and then reordered to a last-to-first format. If it is desired to cut and splice an image sequence, then it may not be possible to place a cut beginning with one image frame without prior decoding of many prior frames, because if the frame in question is encoded as (in MPEG compression) a P or B frame, information from one or more prior, cut frames may be needed in order to splice or reorder desired image frames. Typically, images in a MPEG sequence are formed into a group of picture ("GOP") which do not depend upon other frames, with all images in the GOP being completely decoded prior to any mixing of frames. For these reasons, editors typically operate in the image domain only, upon fully decompressed image sequences.
Conventional editing of compressed digital image sequences is illustrated in FIG. 1, in which a "head portion" 11 of a first compressed image sequence 13 is to be combined with a "tail portion" 15 of a second compressed image sequence 17, to form a composite image sequence 19. The composite image sequence may also be compressed, as indicated by reference numeral 21 in FIG. 1. Compressed frames representing the head portion 11, that is, a group of compressed frames from the first image sequence that will be retained, is seen in bold underlined text in coded order, as is a group of compressed frames representing the tail portion 15 of the second image sequence. Each sequence 13 and 17 is compressed using a MPEG standard, such as each of ten frames of the first image sequence, and ten frames of the second image sequence, are normally each maintained as an I, P or B frame; the image sequences can be stored in memory or CD-ROM, or transmitted via cable, the internet or modem, or made available in some other fashion.
To combine the head and tail portions 11 and 15, it is first conventionally necessary to decode compressed images, i.e., reconstitute each entire visual frame 23, as indicated by conversion arrows 25 and 27 and by decoders 28. As graphically indicated in FIG. 1, a first shaded group 29 of five frames (the end of the head portion) is to be retained from the first image sequence and a second hatched group 31 of five frames (the beginning of the tail portion) is to be retained from the second image sequence. Once completely decoded, the individual frames from the two image sequences can simply be reordered or combined in any fashion. If it is desired to have a compressed composite image, the composite image sequence 19 must typically then be re-encoded using a MPEG standard, as indicated by the reference numeral 33 of FIG. 1 and encoder 34. As is illustrated by comparing the three encoded sequences 13, 17 and 21, the encoded frames (I,P,B) for the composite image may be quite different from each the encoded frames for the first and second sequences.
Unfortunately, manipulation of individual images of a sequence heavily taxes computational resources when images must first be decompressed. For example, a typical editing application as just described can require over 20,000 MOPS ("million operations per second"), which necessitates enormous memory and computational resources. Further, data transmission rates are typically twenty million bits per second (20 mbps) for compressed image sequences, and one billion bits per second (1 bbps) for uncompressed image sequences. As can be seen, therefore, image editing as described cannot practically be implemented for use in either real-time applications, or upon a typical personal computer as, for example, in many video or internet applications.
A definite need exists for a system which permits manipulation of individual images, including reverse-play and image splicing, and which does not require compressed image sequences to be completely decoded. Ideally, such a system would have the capability to splice image sequences or otherwise manipulate images, entirely within the compressed domain (or at least without decoding entire GOPs). A need further exists for an inexpensive system, which can be implemented in software for use upon readily available image processing equipment and, more generally, upon readily available computing devices. Finally, a need exists for a real-time editor for compressed image sequences. The present invention satisfies these needs and provides further, related advantages.