Video coders/decoders are widely used to provide motion video in many present day applications including video conferencing, video mail, video telephony and image database browsing. In order to provide high quality video in such applications, a video system must have high data storage capability.
To illustrate, one frame of a digitized NTSC (National Television Standard Committee) quality video comprising 720.times.480 pixels requires approximately one half megabyte of digital data to represent the image. As a result, in an NTSC system that operates at 30 frames per second, the video will generate over 15 megabytes of data per second. Thus, depending on the video system, video imaging can create significant storage and rate problems.
It has been determined, however, that video can be compressed in some form to reduce these data storage and data rate demands, and thus increase the overall throughput of the system. One method of video data compression involves estimating the motion between successive frames of video. Generally, it has been found that when there is little or no motion of objects within the scene, from one frame to the next, there is a great amount of redundancy between successive frames of the video sequence. As a result, it is unnecessary to send the entire data block for each frame to reliably recreate the image at the receiver (decoder). Rather, the transmitter (encoder) need only send information on those changes or motion of objects between successive frames. That is, it has been determined that the throughput and efficiency of the video encoder could be greatly enhanced by identifying or estimating the motion of objects between successive frames. Consequently, those skilled in the art have found that motion estimation plays an important role in achieving high data compression, and thus high efficiency video encoders.
Motion estimation is a tool or method of predicting a present frame using a target frame to exploit the redundancy between frames, and thus reduce the amount of information transmitted to the receiver. The target frame is a frame other than the present frame. For example, the target frame can be the frame immediately preceding the present frame in time. A motion estimator predicts the motion of objects between frames of video, usually by computing displacement vectors, which is accomplished by computing the differences between blocks of data in successive frames (i.e. block difference calculations). The calculation or extraction of this motion information is computationally intensive, thus placing a burden on the hardware designed to perform such a task. In fact, motion estimation is the most computationally demanding task in encoding video in the H.261, H.262 and H.263 standards.
Presently, many video systems implement a block-based method of motion estimation. In such block-based systems, an image is divided into blocks of pixels. Each pixel defines both a coordinate (or displacement vector) within the frame and an integral value that represents luminary content at that coordinate. To estimate motion, the integral values of each block of pixels in a present frame (hereinafter called "reference block") is compared against the integral values of similarly-sized blocks of pixels in a region of a target frame (hereinafter called "search area"). The search area data blocks that most closely match the reference block provide the best estimate of the change in position of objects in the frame. Thus, if the reference block and the closest matching search area block have the same coordinates, the encoder reports no motion for that block of data. Conversely, if the reference block and the closest matching search area block have significantly differing coordinates, the system assumes motion occurred between frames.
A match may be determined using a number of procedures. For example, a match may be found by taking some metric representing a comparison between pixels, and then summing the metrics, pixel by pixel, over the entire block. Since the metric is typically one of absolute difference, mean-squared difference, etc., the summation of metrics is called a block difference calculation. Moreover, since each pixel within a frame also defines a coordinate or displacement vector, each block difference has a displacement vector associated with it. Thus, the displacement vector associated with the best match represents the direction of motion of an object in the present frame with respect to the target frame. As a result, the primary function of the block based motion estimator is to find a block of pixels in the target frame that most closely matches the block of pixels in the present frame to determine if and how much motion has occurred between these frames, and thus enable a host microprocessor (i.e. of a video codec) to minimize the amount of information that must be sent from frame to frame to recreate the images at the receiver.
Depending on the desired quality of the video encoder, the design of the architecture of such block based motion estimators can become fairly complex. In addition, the algorithm required to direct the estimator architecture to perform the computations necessary to find the block difference for each location in a search area can also become quite complex. As a result, over the years many block-based motion estimation algorithms have been proposed, each algorithm presenting different levels of video quality at the cost of different levels of architecture and algorithm complexity.
One such algorithm is the full search algorithm. In full search motion estimation, the estimator systematically searches through every vector point within the search area (target frame) and compares each and every unique reference-block-sized region within the search area with the reference block to find the best match. Since most pixels are shared between successive reference-block-sized regions within a search area, search areas are typically traversed one pixel offset per block difference. As a result, the block difference computation is a key operation in motion estimation (See FIG. 1). FIG. 1 shows a reference block 80 traversing a target or search area 81, pixel by pixel, wherein displacement vector 82 indicates the present position of reference block 82.
Although many different motion estimation algorithms have been developed over the years, they all involve block-difference computation similar to the computation of an M.times.N block by the distortions function D(x,y), defined as: ##EQU1## where s(i,j) is the search area pixel, r(i,j) is the reference macroblock pixel and D(x,y) is the block displacement in the search. The present day motion estimators differ mainly in the size of the M.times.N data blocks and the size of the search area (indicated by the range of x and y) that is used to calculate the block differences. In any event, however, the data in all present day motion estimators is stored in memory.
To illustrate the manner in which present day estimators compute block differences, an M=N=8 block is searched through a search area having an index range between (0,0) and (21,21). That is, 21.gtoreq.x.gtoreq.0 and 21.gtoreq.y.gtoreq.0. Typically, the computation is done through an array having a set of processing elements. The processing elements hold a specific r(m,n) content, compute the absolute difference .vertline.s(m+x,n+y)-r(m,n).vertline., and add the absolute difference to a partially accumulated sum. FIG. 2 illustrates the block difference computation for the given search area 85.
As shown, the reference pels r(m,n) are stored in the array. The reference pels represent the pixel data of a block of pixels in the present or reference frame. The memory then pumps search area pels into the array and computes the difference between the search area pixel and the reference pixel, the difference being called a partial score. The array then sums up the partial scores of the pixel by pixel difference computations for the entire reference block, and generates block differences D(x,y) therefore. This is often done in pel stripes. For example, the computation is computed for the first stripe D(0,0), D(0,1), D(0,2), . . . , D(0,21), and then the second stripe D(1,0), D(1,1), D(1,2), . . . , D(1,21), and finally to the 22 stripe D(21,0), D(21,1), D(21,2), . . . , D(21,21).
There are two types of conventional memory-array architecture in performing such block difference computation. The first type is a serial 1D array as illustrated in FIG. 3. Here, the memory 87 feeds the 1D array 88 in 1-pel wide stripes while the array 88 uses side registers 89 to align the partial sums computed by the processing elements. This approach to computing block differences was implemented in many video encoder subsystems and motion estimators.
Another approach to calculating block differences is to use parallel memory and parallel 2D array structure as illustrated in FIG. 4. Note that in FIG. 4 the data needs to be realigned before the array 90 computes a new stripe of D(x,y). For example, when computing the first stripe D(0,0), D(0,1), . . . D(0,21), a processing element 91 with r(0,0) computes the absolute difference between r(0,0) and s(0,i) to be accumulated, where 0.ltoreq.x.ltoreq.21. Therefore, the processing element should receive a pel stream of s(0,i) from the memory.
In general, for the first stripe, processing elements 91 on column 0 should receive pel stream s(0,i), processing elements 91 on column 1 should receive pel stream s(1,i), . . . , and processing elements 91 on column 7 should receive pel stream s(7,i). For the second stripe D(1,0), D(1,1), . . . , D(1,21), the processing element with r(0,0) computes the absolute difference between r(0,0) and s(1,i). Thus, processing elements 91 on column 0 receives a pel stream s(1,i) from the memory. Similarly, column 1 receives pel stream s(2,i), column 2 receives pel stream s(3,i) etc.
Thus, in this approach to calculating block differences, the pel stream alignment is shifted and the data is aligned in a pattern as shown in FIG. 5. As shown, for the first stripe, D(0,0), D(0,1), . . . D(0,21), the reference block is matched against the pel stripe 93 delineated with solid lines within the search region 85 shown in FIG. 2. For the second stripe D(1,0)(1,1), . . . , D(1,21), reference block is moved one pel to the right to match against the second pel stripe 94 delineated with dash lines in FIG. 2. This one pel shift per stripe is repeated until the last search stripe is shifted in.
In order to support the data alignment pattern of this approach, however, the memory must provide addition bandwidth to retrieve all the pels. Moreover, the windowing/shifting block must properly select the appropriate pels and align them in the appropriate array columns. As shown in FIG. 4, a typical architecture that accommodates this approach utilizes a 16-pel wide memory to extract data from an 8-pel window.
Although there are motion estimators that implement other approaches to manipulating the reference data and the search area data, such as reloading the reference pels, they all require additional memory access bandwidth. Consequently, present day motion estimators implement full search algorithms and architecture that require very complex methods of aligning the data in the array for making the block difference calculation. As a result, present day video encoding subsystems have motion estimators that implement search algorithms and complex architecture that require much processing power.