1. Field of Invention
The present invention relates generally to enlarging two- and three-dimensional digital data arrays representing physical entities, and more particularly to the application of the Discrete Cosine Transform (DCT) as an interpolation function for video data. This invention has application in the up-conversion or down-conversion of video formats, i.e., for the purpose of HDTV format conversion.
2. Description of Prior Art
The use of mathematical functions for the purpose of up-sampling (interpolation) to increase a number of data points and for down-sampling to reduce a number of data points is well-known. Examples of the use of such functions will now be explained with reference to FIGS. 1–3.
FIG. 1 is a data flow chart showing the conventional application of the average function, which is perhaps the simplest interpolation function, to data points 10 described by formulas 12 forming new data points 14, described by formulas 16. Each of these new, intermediate data points has a value that is the average of the adjacent data points. Many additional filters having special properties (blurring, sharpening, contrast enhancement, etc.) have been described in the literature.
It is well established in the literature of the field of video compression that video can be well-modeled as a stationary Markov-1 process. This statistical model predicts the video behavior quite well, with measured correlations as high as 0.99 in the frame direction.
It is also well-known the Karhunen-Loeve Transform (KLT) perfectly decorrelates Markov-distributed video. This means the basis of the KLT is an independent set of vectors which encode (i.e., predict) the pixel values of the video sequence.
It is a further result that many discrete transforms well approximate the KLT for large correlation values. Perhaps the best-known such function is the DCT, although many other functions (DST, WHT, etc.) serve as reasonable approximations to the KLT.
If a discrete transform that approximates the KLT representation is considered a continuous function, that function ideally approximates the video in that each of the independent basis functions predicts video values smoothed at the sampling rate of the original video. Thus the sampled IDCT (or polynomially-fit IKLT coefficients, IDST, or other continuous approximation to the inverse KLT) forms an optimal filter for prediction of interpolated pixel values.
The patent literature includes a number of patents and publications describing methods for using interpolation functions to expand an image array in the horizontal and/or vertical directions. For example, U.S. Pat. No. 6,141,017 to Cubillo et al. describes a method and apparatus using data interpolation to provide a scaled expanded image array of an original image array representing a physical entity, such as an image or sequence of frames, by utilizing fractal transform methods. Fractal transform methods are used to select a best domain/range block correspondence in an original image array, with the range block being larger than the domain block. A subset of the data values in the range block is selected. Finally, an area of the expanded image array is populated with the data values in the selected subset where the area corresponds to the domain block location in the original image array.
U.S. Pat. App. Pub. No. 2002/0136293 to Washino describes a multi-format digital video production system enabling a user to process an input video program to produce an output video of the program in a final format, which may have a different frame rate, pixel dimensions, or both. In the preferred embodiment, specialized graphics processing capabilities are included in a high-performance personal computer or workstation. Images are resized horizontally and vertically by pixel interpolation. Frame rates are adapted by inter-frame interpolation, or by traditional schemes.
U.S. Pat. No. 6,356,315 to Chen et al. describes a method for achieving magnification or reduction by a single FIR filter under the control of a DDA (digital differential amplifier) as would be used to simulate a perfectly straight line on a two-dimensional raster. The single FIR filter combines the processes of interpolation, filtering, and decimation. The DDA is programmed with the desired magnification or reduction ratio to provide signals that control shifting of input samples into the FIR filter and selection of FIR coefficients for the FIR filter.
U.S. Pat. App. Pub. No. 2002/01364446 to Slavin describes a system and method for producing a resampled image from a source image by solving coefficients for a cubic polynomial transition model.
U.S. Pat. App. Pub. No. 2002/0009225 to Takahashi et al. describes a resolution conversion method in which an image signal expressing pel (pixel) values and a significance signal indicating whether the pel is significant are supplied as input signals.
By referring to the input significance signal values for the pels proximal to the pel being processed, the significant pels are identified, and a resolution conversion characteristics selector selects one of two or more frequency conversion characteristics to be used for resolution conversion of the image signal using only significant pels.
U.S. Pat. App. Pub. No. 2001/0024515 to Martins, et al. describes a method and apparatus to interpolate video frames, in which the method comprises, for a plurality of interpolated pixels in an interpolated video frame, classifying an interpolated pixel of the plurality as one of stationary, moving, covered, and uncovered, and then setting components of the interpolated pixel in components of a previous pixel from a previous video frame, the previous pixel corresponding to the interpolated pixel in the video frame.
U.S. Pat. No. 5,657,082 to Harada et al. describes an imaging apparatus capable of outputting video signals. The imaging apparatus includes: a color separating optical system for separating incident image light into light beams of primary colors; first, second, and third imaging portions respectively including a first, second, and third set of pixels arranged at regular pitches in a first direction and in a second direction which is perpendicular to the first direction, the first, second, and third imaging portion receiving each one of the light beams of the primary colors and respectively accumulating them in the first, second, and third set of pixels, as the image signal, the first set of pixels being shifted by ½ pitch with respect to the second and third set of pixels in the first direction and the second direction; an A/D converter for converting the image signals of primary colors accumulated in the first, second, and third sets of pixels into digital signals; an interpolation processor for performing interpolation processing to the image signals of the primary colors which are converted into the digital signals in the A/D conversion, thereby doubling the number of pixels in the second direction; and a pixel number converter for converting the image signals of primary colors which are interpolated in the interpolation processor into image signals based on any one of the plurality of formats.
A number of other patents describe methods for interpolating the movements of visual objects occurring among frames of data to increase the number of frames per second in moving video data. For example, U.S. Pat. No. 6,229,570 to Bugwadia et al. describes a process for up-converting an existing video source signal having a low frequency, in frames per second, to a high frequency signal for use with HDTV (high definition television). The process samples the existing frames in the existing video signal and calculates integer displacements of pels (pixels) within the existing frames. A polynomial curve fit is then performed on the displacements to obtain estimates of horizontal and vertical displacements of each block in each existing frame. Based on the alignments of the blocks within a sampling grid on each frame, the blocks are segregated into groups. The block groups are then used to interpolate missing or required frames of the high frequency signal in a piecemeal manner by utilizing blocks of a particular block group to estimate a corresponding block of the high frequency signal.
U.S. Pat. No. 6,377,621 to Borer describes a method for performing improved motion compensated interpolation of moving images using motion vectors of variable reliability. By taking into account the reliability of the motion vectors, produced by a separate motion estimation device, a subjectively pleasing interpolation can be produced. The method allows a gradual transition between motion compensated and non-motion compensated interpolation depending on the reliability of the motion vector used. This is achieved by modifying the temporal interpolation timing, using a look up table, controlled by a vector reliability signal produced by the motion estimator.
U.S. Pat. No. 6,452,639 to Wagner et al. describes a method and system for converting from interlaced to progressive video frames, using an interpolation algorithm that is scalable, having several levels or modes with an increasing computational complexity, depending on the level of resources available for the conversion process.
U.S. Pat. App. Pub. No. 2000/085114 to Ojo, et al. describes a processing circuit for removing interlacing from video signals. The processing circuit comprises a line memory, a de-interlacing circuit, a frame memory, and a cache memory, in which a pixel mixer is interposed between the cache memory and the de-interlacing circuit.
U.S. Pat. App. Pub. No. 2002/0036705 to Lee et al. describes a format converter and method that performs frame-rate conversion and de-interlacing using a bi-directional motion vector. The method includes steps of (a) estimating a bi-directional motion vector between the current frame and the previous frame from a frame to be interpolated; (b) setting the motion vector of a neighboring block that has the minimum error distortion, among motion vectors estimated in step (a); and (c) forming a frame to be interpolated with the motion vector set in step (b).
What is needed is a method applying the same simple process to change both the format of video data, in horizontal and vertical directions, and for varying the frame rate of the video data.
FIG. 2 is a data flow chart showing the application of a first conventional sampling technique for down-sampling to reduce the number of data points 20 described by formulas 22, with intermediate data points are discarded to form data points 24 described by formulas 26.
FIG. 3 is a data flow chart showing the application of a second conventional sampling technique to reduce the number of data points 20 described by formulas 22, with adjacent pairs of values being averaged to form single data points 28 described by formulas 30.
The DCT (discrete cosine transform) is a mathematical transform that is widely used in the compression of still video images to simplify their transmission and storage both for use as still images and for use as portions of motion images. In particular, the DCT is used in the JPEG (Joint Photographic Experts Group) standard for digital compression, which is the most popular method for digital compression of still images, with the DCT transform being easily implemented in both software and hardware.
The JPEG video compression process begins with subdividing a video image into a number of macroblocks, each of which includes 64 pixels from an 8×8 square block of the original image. Each of these macroblocks is then sent into the DCT-based encoder portion of the compression routine, with a functional unit called the FDCT (forward discrete cosine transform) receiving the macroblock to apply a transform to the pixels therein. The transformed data is then sent to a quantizer, which performs the first stage of compression in which data is lost. The compressed data is then sent to an entropy encoder, in which a coding system such as Huffman coding is applied. Finally, the data is converted into a bit stream for storage or for immediate transmission.
The JPEG decompression technique is symmetrical with the compression technique, applying inverse processes to reverse the processes previously applied. The process begins with an entropy decoder, which undoes the process performed by the entropy encoder. Data from the entropy decoder moves to a dequantization stage, and, then, to a stage applying an IDCT (inverse discrete cosine transform) process to reverse the FDCT process applied during compression.
A Motion JPEG (M-JPEG) compression technique is an extension of the JPEG technique to handle moving pictures, with each frame in a video sequence being individually compressed using JPEG. Like JPEG, M-JPEG attains compression ratios of 10:1 to 15:1.
The MPEG (Motion Picture Experts Group) uses a much more complex technique to attain compression ratios of 30:1, with some frames in a sequence being encoded as I frames (intra frames), which are self contained, being encoded using a DCT-based technique similar to JPEG. Other frames are encoded as P frames (predicted frames) using forward predictive coding, in which the actual frame is coded with reference to a previous I or P frame. Other frames are encoded as B frames (bidirectional or interpolated frames) that are coded using a past frame and a future frame, which may be I or P frames, as reference frames.
The patent literature further includes a number of patents describing systems and methods for improving the implementation of IDCT transforms to decompress video input signals transmitted according to the MPEG standard, and particularly to decompress such input signals to provide variations in the resolution and frame rate of the resulting video signals. For example, U.S. Pat. No. 5,563,660 to Tsukagoshi describes an apparatus for expanding a compressed digital video signal representing a motion picture to provide a digital video output signal. The apparatus includes a decoder for the compressed digital video signal using a common memory for decoding and for a ⅔ pull-down conversion of the reconstructed interlaced frames stored in the frame memory with a frame rate of 24 Hz to provide the pictures of the digital video output signal with a picture rate of at least 49 Hz.
U.S. Pat. App. Pub. No. 2002/0136308 to La Maguet et al. describes a method for generating a down-sampled video from an input coded video coded according to a block-based technique and comprising quantized DCT coefficients defining DCT blocks. The down-sampled video is composed of frames having a smaller format than the frame used to produce the input coded video.
U.S. Pat. No. 6,442,201 to Choi describes a down-conversion decoding device of a digital television producing a high-quality video image from a coded video signal with a reduced-size memory.
U.S. Pat. App. Pub. No. 2002/0141501 to Krishnamachari describes a system that increases a resolution of at least a portion of reference frame of video based on pixels in the reference frame and pixels in one or more succeeding target frames of the video. In the particular case of MPEG-coded video, blocks in the target frames are located using motion vector information present in the MPEG bit stream. Values of additional pixels are determined based on values of pixels in the first block and on values of pixels the one or more blocks, whereafter the additional pixels are added among the pixels in the first block in order to increase the block's resolution.
U.S. Pat. No. 6,456,663 to Kim describes a DCT-domain down conversion system that compensates for an IDCT mismatch to reduce aliasing. In the conversion system, a DCT domain filter is applied to the unquantized DCT coefficient values, and IDCT mismatch control processing is implemented. The DCT domain filter sets the filter coefficient corresponding to the highest frequency band to unity to prevent modification of any coefficient value that has been modified by the IDCT mismatch operation.
U.S. Pat. App. Pub. No. 2001/0055340 to Kim et al. describes an HDTV down conversion system including an apparatus for forming a low-resolution, 2:1 down converted video signal from an encoded video signal representing a video image, with the encoded video signal being a frequency-domain transformed high-resolution video signal with motion prediction.
The DCT is used because it predicts a representation of data that mirrors the statistical behavior of video data. This means that the interpolated data is that which, on the average, would have been present in typical video images. The synthesized higher-resolution image is therefore more accurate than that of any other filter. What is needed is a simple method using the DCT to produce a change both in the resolution of individual frames of video data and in the frame rate at which frames of video data are presented.