1. Field of the Invention
The present inventions relate to methods and apparatus for processing data in a compressed format, including the processing of a plurality of frames of input data while in a transform domain and, for example, a method and apparatus for processing input vectors associated with the input data frames to determine contributions of cross-products of the input vectors to an output vector associated with a frame of output data in consideration of an orthogonal characteristic of a convolution operation employed to generate the output data.
2. Description of the Related Art
As with many of today""s technologies, the current trend in image sequence developing and editing is to use digital formats. Even with motion picture film, editing of image sequences (including image splicing, color processing, and special effects) can be much more precisely accomplished by first converting images to a digital format, and performing desired edits upon the digital format. If desired, images can then be converted back to the original format.
Unfortunately, digital formats usually use enormous amounts of memory and transmission bandwidth. A single image with a resolution of 200xc3x97300 pixels can occupy megabytes of memory. When it is considered that many applications (for example, motion picture film processing) use far greater resolution, and that image sequences can include hundreds or thousands of images, it becomes very apparent that many applications are called upon to handle gigabytes of information, creating a bandwidth problem, in terms of computational and transmission resources.
To solve the bandwidth problem, standards have been proposed for image compression. These standards generally rely upon spatial or temporal redundancies which exist in one or more images.
A single image, for example, may have spatial redundancies in the form of regions having the same color (intensity and hue); a single, all blue image could potentially be represented simply by its intensity and hue, and information indicating that the entire frame has the same characteristics.
Temporal redundancies typically exist in sequences of images, and compression usually exploits these redundancies as well. For example, adjacent images in a sequence can be very much alike; exploiting redundancies, a compressed image sequence may include data on how to reconstruct current image frames based upon previously decoded frames. This data can be expressed as a series of vectors and difference information. To obtain this information, pixels in the second frame are grouped into image squares of 8xc3x978 or 16xc3x9716 pixels (xe2x80x9cblocksxe2x80x9d of pixels), and a search is made in a similar location in a prior frame for the closest match. The vectors and difference information direct a decoder to reconstruct each image block of the second frame by going back to the first frame, taking a close match of the data (identified by the vector) and making some adjustments (identified by the difference information), to completely reconstruct the second frame.
One group of standards currently popular for compression of image sequences has been defined by the Moving Pictures Experts"" Group, and these standards are generally referred to as xe2x80x9cMPEG.xe2x80x9d The MPEG standards generally call for compression of individual images into three different types of compressed image frames: compressed independent (xe2x80x9cIxe2x80x9d) frames exploit only spatial redundancies, and contain all the information used to reconstruct a single frame; compressed prediction (xe2x80x9cPxe2x80x9d) frames exploit temporal redundancies from a prior frame (either a P or I frame) and typically only use about ⅓ as much data as an I frame for complete frame reconstruction; and compressed bi-directional (xe2x80x9cBxe2x80x9d) frames can use data from either or both of prior and future frames (P or I frames) to provide frame reconstruction, and may only use xc2xc as much data as a P frame. Other compression standards also rely upon exploitation of temporal image redundancies, for example, H.261 and H.263.
Chroma-keying or blue screen matting is used widely in digital video editing to create the illusion of motion or presence at some specific place. In such applications, an object is filmed against a blue background which, in the editing process, is replaced by a static or a moving shot at some specific place to create the desired illusion. Unlike simple overlapping, in which the background of the overlapping image/video is black (a zero in digital image representation) and can be done via masking, chroma-keying uses the transparency of the chromakey pixels thus making a pixel-wise operation necessary. The degree of transparency of each pixel is called the alpha channel or alpha image. Accordingly, chroma-keying is also referred to as xe2x80x9calpha blendingxe2x80x9d. The alpha channel, which represents the transparency of each pixel, can be derived from the chromakey specified by the video editor, and then a pixel multiplication operation is performed to accomplish the chroma-keying effect.
In digital TV broadcasting, regular TV programs (live or pre-recorded) are typically stored and transmitted in a compressed form. MPEG-2 is a compressed form used in many digital TV consortia such as HDTV or ATSC. Conventional processing of compressed image or video data involves first decompressing the data, and then applying the desired processing function. The processed data is then recompressed for transmission or storage.
Compressed domain processing may yield several advantages vis-a-vis spatial domain processing such as (a) smaller data volume, (b) lower computation complexity since the processes of complete decompression and compression can be avoided, and (c) preservation of image fidelity since decompression-compression processes can often be eliminated. Thus, it would be helpful to replace the spatial domain processing scheme with an equivalent processing of the compressed domain representation.
A conventional way of performing chroma-keying on MPEG sequence is to decompress the sequence, apply the chroma-keying operation and recompress it back. Within this loop, costly DCT and motion estimation operations may make it effectively impossible for real time applications. Therefore, it would be helpful to have a chroma-keying technique applicable in the compressed domain to avoid the DCT and motion estimation bottlenecks.
For example, a logo keying operation is used frequently in the digital video broadcasting environment. Conventionally, the compressed stream is fully decompressed, and then compressed again after the logo-keying operation.
FIG. 2 is a functional block diagram of a conventional logo keying operation 200. After entropy decoding of the compressed stream at block 202, a forward discrete cosine transform (FDCT) is employed at block 204. Logo image data is represented by block 206. The output of the FDCT and the logo image data are both provided to block 208 which represents keying in the spatial domain. An inverse discrete cosine transform (IDCT) is applied at block 210 to the output of block 208. And finally, block 212 represents entropy encoding of the output of block 210 to provide the recompressed stream.
In the case of chroma-keying or blue screen matting, it would be helpful if both the object video and the background video could be processed in compressed form, especially in distributed systems, such as where the object video, the background video and/or the composite video are generated in different locations.
The present inventions provide methods and apparatus for combining separate data segments while still in a compressed form, such as mixing two compressed video segments into a single combined video segment, such as a composite, while keeping the two compressed video segments substantially in their compressed formats during processing. In one aspect of the present inventions, methods and apparatus are provided for pixel multiplication in a compressed domain such as the DCT domain, which leads to an efficient scheme for chroma-keying or blue screen matting. By way of example, one aspect of the present inventions provides for direct manipulation of a JPEG or MPEG compressed domain representation to achieve the desired spatial domain chroma-keying.
In accordance with one aspect of the present inventions, methods and apparatus are provided for receiving a plurality of input data segments, which may be arranged in groups such as frames of data in the form of input video data frames. Data segments are received as, or are converted into, a plurality of data elements such as vectors, at least several of which are orthogonal with respect to each other, or can be converted into a form in which they are orthogonal to each other. For example, the data segments can be in the form of input vectors associated with transform representations of input video data frames, and the input vectors can be evaluated to determine which cross-products of those input vectors yield non-zero output vectors. By identifying such input vectors, processing steps such as combining data segments such as video segments into a single data segment can be streamlined by considering only such input vectors and ignoring or reducing the weight accorded to other vectors whose cross products yield zero or negligible output vectors.
In a further form of the inventions, methods and apparatus are provided for determining weighting factors for cross products of input data representations. The weighting factors can be developed as a function of a convolution operation. As a result, the resulting combined data segment will be produced in a way that gives appropriate weight to the respective contributions of the input data segments. For example, a composite video will be produced which preferably gives a more accurate view or representation (visually) of the combined images as though they had been originally recorded with the images combined. In other words, appropriate weight will be given to each contribution from the input video so that the output video has the desired appearance. Similar results are also desirable for combining other forms of data, such as audio segments and other data segments. Moreover, the weighting can be modified as desired to produce other effects.
In a further form of one aspect of the present inventions, a logo keying operation can be performed using the methods and apparatus of the present inventions. For purposes of illustration only of one aspect of the present inventions, FIG. 3 is a high-level functional block diagram of a DCT logo keying operation 300 according to one aspect of the present inventions. The DCT domain keying method of the present inventions facilitates operation in the compressed domain; thus, only entropy coding is used. Generally, a spatial domain pixel multiplication is replaced with a compressed domain DCT convolution operation. Referring again to FIG. 3, conventional entropy decoding 202 and entropy encoding 212 processes may be employed at the beginning and end, respectively, of the operation 300. A DCT logo image is represented by block 302. As described in greater detail below, the output of the entropy decoding block 202 and the DCT logo image block 302 are provided to block 304 which represents DCT domain keying according to one aspect of the present inventions.
However, performing a convolution operation in the DCT domain results in replacing a simpler element-to-element multiplication process with a more complex convolution process. Notwithstanding, the method of one aspect of the present inventions is more efficient than the conventional element-to-element multiplication process because it exploits properties, e.g., symmetry and orthogonality in the DCT convolution function, which are not available in the spatial domain. By way of example, a set of DCT convolution theorems according to one aspect of the present inventions is formed and then used to derive efficient chroma-keying systems which provide a significant reduction in computation complexity.
In the case of chroma-keying, an exemplary preferred method according to another aspect of the present inventions provides that the compressed stream is processed in the compressed (e.g., DCT) domain without explicit decompression and spatial domain keying so that the resulting compressed stream corresponds to a chroma-keyed image. One method disclosed herein ensures that the resulting compressed stream conforms to the standard syntax of 8xc3x978 DCT matrices. For typical data sets, this approach of chroma-keying in the compressed domain results in computation savings of around 70% as compared with traditional spatial domain methods for chroma-keying wherein the data is first decompressed and then keyed in the spatial domain and then re-compressed again for storage or transmission.
In accordance with a specific illustrative embodiment of one aspect of the present inventions, a method for receiving and processing digitized video data in a transform domain includes the steps of: providing a plurality of input video data frames in a transform domain; identifying input vectors associated with the input video data frames; in consideration of an orthogonal characteristic of a convolution function, determining which cross-products of the input vectors would yield non-zero output vectors and determining weighting factors for the cross-products; and generating an output video data frame by determining the cross-products of only the input vectors which would yield non-zero output vectors and by applying the weighting factors to the cross-products.
In a further aspect of the present inventions, the output video data frame represents a chroma-keying or alpha blending operation associated with the input video data frames.
In another aspect of the present inventions, an apparatus includes: a mechanism for receiving a plurality of input video data frames; a mechanism for identifying input vectors associated with a transform representation of the input video data frames; and a mechanism for determining which cross-products of the input vectors would yield non-zero output vectors and for determining weighting factors for the cross-products in consideration of an orthogonal property associated with a convolution operation.
In another aspect of the present inventions, an apparatus operative to receive, process and output digitized data frames includes: machine readable media; and instructions stored on the machine readable media that instruct a machine to receive a plurality of static input data frames associated with an orthogonal transform, identify input vectors associated with the static input data frames, determine contributions of cross-products of the input vectors to output vectors and determine weighting factors for the cross-products in consideration of an orthogonal characteristic of a convolution operation, and generate an output data frame associated with the cross-products and weighting factors.
In a further aspect of the present inventions, the convolution operation is employed to generate the output data frame and includes two discrete cosine transform (DCT) convolutions.