The present invention relates to image sequence compression. More particularly, this disclosure provides a compression system that utilizes independently coded regions to permit select extraction of image objects, or editing of select areas of an image frame, without necessarily decompressing all image data in each frame. This disclosure also provides a mechanism of tracking the objects and regions across multiple frames such that, if desired, they may be independently coded and extracted from a video sequence.
Conventional editing or other processing of film or video images is performed in the xe2x80x9cspatialxe2x80x9d domain, that is, upon actual images rather than upon a compressed representation of those images. Since the final product of such editing or processing is frequently an uncompressed signal (such as a typical xe2x80x9cNTSCxe2x80x9d television signal), such editing or processing can sometimes with today""s digital editors and computers be accomplished in real-time. With increasing tendency toward high resolution pictures such as high definition television (xe2x80x9cHDTVxe2x80x9d), however, Internet, cable, television. network and other service providers will likely all have to begin directly providing compressed signals as the final product of editing. As used herein, the term xe2x80x9cvideoxe2x80x9d will refer to any electronic signal that represents a moving picture sequence, whether digital, NTSC, or another format.
One problem relating to the new digital standards relates to efficiently and quickly processing video; with video stored or transmitted in compressed format under the new standards, it is difficult computationally to decompress video, process that video in the spatial domain, and then recompress output video. Examples of processing compressed video prior to display include providing fast forward, reverse and other effects typically associated with VCRs. Other processing examples associated with the production or broadcast of video include color correction, logo insertion, blue matting, and other conventional processes.
To take one example of this computational difficulty, in logo insertion, a local television station might receive a compressed satellite feed, insert its own TV station logo in a corner of the image that will be seen on viewers TV sets, and then broadcast a TV signal over cable, back over satellite or through the airwaves. Conventionally, the processing could be performed in real time or with a short delay, because it is relatively easy to decompress an image, modify that image in the spatial domain and transmit a spatial domain signal (e.g., an uncompressed NTSC signal). With HDTV and other new digital standards, which call for all transmissions in a compressed format, this quick processing becomes much more difficult, since it is very computationally expensive to compress a video signal.
All of the video examples given above, e.g., logo insertion, color correction, fast forward, reverse, blue matting, and similar types of editing and processing procedures, will collectively be referred to interchangeably as xe2x80x9ceditingxe2x80x9d or xe2x80x9cprocessingxe2x80x9d in this disclosure. xe2x80x9cFast forwardxe2x80x9d and similar features commonly associated with a video cassette recorder (xe2x80x9cVCRxe2x80x9d) are referred to in this manner, because it may be desired to change the sequence or display rate of frames (thereby modifying an original video signal) and output a new, compressed output signal that includes these changes. The compressed output signal will often require that frames be reordered and re-encoded in a different format (e.g., to depend upon different frames), and therefore is regarded as one type of xe2x80x9ceditingxe2x80x9d.
In most of the examples given, since editing or processing is typically done entirely in the spatial domain, a video signal must typically be entirely decompressed to the spatial domain, and then recompressed. These operations are typically required even if only a small part of an image frame (or group of frames) is being edited. For example, taking the case of logo insertion in the bottom right corner of an image frame, it is extremely difficult to determine which part of a compressed bit stream represents a frame""s bottom right corner and, consequently, each frame of the video sequence is typically entirely decompressed and edited. If it is desired to form a compressed output signal, frames of the edited signal must then typically be compressed anew.
In this regard, many compression formats are based upon xe2x80x9cmotion estimationxe2x80x9d and xe2x80x9cmotion compensation.xe2x80x9d In these compression formats, blocks or objects in a xe2x80x9ccurrentxe2x80x9d frame are recreated from similar blocks or objects in one or two xe2x80x9canchorxe2x80x9d frames; xe2x80x9cmotion estimationxe2x80x9d refers to a part of the encoding process where a computer for each block or object of a current frame searches for a similar image pattern within a fairly large area of each anchor frame, and determines a closest match within this area. The result of this process is a motion vector which usually describes the relative position of the closest match in an anchor frame. xe2x80x9cMotion compensationxe2x80x9d refers to another part of the encoding process, where differences between each block or object and its closest match are taken, and these differences (which are ideally all zeros if the match is xe2x80x9cgoodxe2x80x9d) are then encoded in some compact fashion, often using a discrete cosine transform (xe2x80x9cDCTxe2x80x9d). These processes simply imply that each portion of the current frame can be almost exactly reconstructed using the location of a similar looking portion of the anchor frame as well as difference values. Not every frame in a sequence is compressed in this manner.
Motion estimation is very computationally expensive. For example, in applying the MPEG-2 standard, a system typically takes each block of 8xc3x978 pixels and searches for a closest match within a 15xc3x9715 pixel search window, centered about the expected location for the closest match; such a search involves 64 comparisons to find the closest match, and each comparison in turn requires 64 separate subtractions of multi-bit intensity values. When it is considered that a typical image frame can have thousands of 8xc3x978 pixel blocks, and that this searching is typically performed for the majority of frames in a video sequence, it becomes quite apparent that motion estimation is a computationally expensive task.
With the expected migration to digital video and more compact compressed transmission formats, it is apparent that a definite need exists for quick compression systems and for systems which provide quick editing ability. Ideally, such a system should permit decoding and editing of a compressed signal (e.g., VCR functions, logo insertion, etcetera) yet permit real-time construction and output of compressed, edited video signal that can be accepted by HDTV and other new digital systems. Ideally, such a system would operate in a manner compatible with existing object-based and block-based standards and desired editing procedures, e.g., such that it can specially handle a logo to be inserted into a compressed signal, as well as other forms of editing and processing. Further still, such a system ideally should be implemented as much as possible in software, so as to be compatible with existing computers and other machines which process video. The present invention satisfies these needs and provides further, related advantages.
The present invention solves the aforementioned needs by providing a system having independently coded regions. Using these regions, one may specially compress and encode an image sequence in a manner that permits extraction or editing of select image objects in the spatial domain, without need to decode and decompress the entire image sequence. If it is desired to modify a compressed output signal to include modified data for an object (e.g., for an edited object), new data can be inserted as appropriate in the place of the extracted object; with the object being independently coded, all other compressed data for the image (e.g., background or other objects) may be exactly re-used. In real time applications (such as logo insertion), this ability facilitates editing and production of a compressed output signal, using standard computer and editing equipment. As can be seen therefore, the present invention should have ready application to production, post production, network syndication, Internet video, and other applications which call for the production of compressed video.
More particularly, one form of the invention provides an improved signal format that is adapted for independent region processing. This signal may also be standard compliant such that it can still be decoded or processed with any standard compliant decoder or processor. A region can be user-defined to be an object of interest appearing in a video sequence, or a geographic location within video frames. A corresponding region for each frame is search limited during compression such that any motion vectors and residuals (if applicable to the frame) point only to a corresponding region within an anchor frame. Preferably, each frame has multiple regions, each independently coded from one another, with each object represented by a region thereby adapted for ready extraction.
As can be seen from the foregoing, the present invention facilitates extraction of objects or regions from compressed image sequences and, further, facilitates subsequent editing and re-compression with minimal use of processing resources; that is to say, with video compressed to have independently coded regions in accordance with the present invention, it should be possible to subsequently extract and edit one region in real time without requiring extensive computational resources. The present invention can therefore be expected to have significant utility in processing of digital image processing, especially digital video.
The invention may be better understood by referring to the following detailed description, which should be read in conjunction with the accompanying drawings. The detailed description of a particular preferred embodiment, set out below to enable one to build and use one particular implementation of the invention, is not intended to limit the enumerated claims, but to serve as a particular example thereof.