1. Field of the Invention
The present invention relates to signal processing and, in particular, to computer-implemented processes and apparatuses for performing motion estimation with an enhanced camera interface.
2. Description of the Related Art
Motion estimation is commonly utilized by video encoders in signal processing techniques that compress successive frames of video data ("video frames"). For example, a plurality of video frames, each represented by a bitstream, may represent successive images of a motion video. When these video frames are to be transmitted via a communication medium of limited bandwidth, or are to be stored in a storage medium having limited storage capacity, it is often desirable to first compress the data contained in the bitstreams.
Motion estimation is one of the most computationally intense of the various techniques utilized to compress data. Motion estimation techniques exploit the temporal correlation that often exists between consecutive video frames, in which there is a tendency of some objects or image features to move within restricted boundaries from one location to another from frame to frame.
For instance, frame 1 may contain an object, and frame 2 may contain an identical set of pixels corresponding to the object spatially displaced by a few pixels from the location of the same set of pixels in frame 1. If frame 1 is transmitted a video processor to a remote pixel processor or video processor (which performs any necessary decompression or other decoding), frame 2 may be transmitted without the pixels corresponding to the object. Instead, information such as motion vectors or pointers is sent along with frame 2 (which may also be compressed using other techniques). These motion vectors may be utilized by the remote receiving video processor when decoding the received video frame 2 to reproduce the object from frame 1 at a new location within frame 2. Since such motion vectors can be represented with fewer bits than the pixels that comprise the object, fewer bits need to be transmitted (or stored) in order to recreate the object in Frame 2.
The motion estimation procedure may be performed at the encoder level by comparing given regions or blocks within a current video frame to many regions or blocks within the previous video frame. The process of comparing a given block of one frame to blocks of another frame to find a sufficiently similar match is often called "block matching," and the process of comparing one frame against another in this manner is often called "frame differencing." Frame differencing and block matching are thus essential elements of the motion estimation procedure. Blocks are matched by determining a "difference measurement" between any given pair of blocks. A difference measurement corresponds to the overall degree of difference of the two regions. If the difference measurement is below a predetermined threshold, the blocks are considered to be similar enough that a block match is indicated. If so, the block in the previous video frame may be utilized as described above by the video decoder to reproduce the same block in the current video frame.
The video frames which are to be encoded via motion estimation are typically received from a video camera. When analog video cameras are used that produce analog video signals, the signals are digitized and converted to digital pixel data. Digital video cameras with charge-coupled devices (CCDs) may also be utilized which directly generate digital data representing video frames that may be provided to a video processor. Theoretically, still background areas of a video sequence should have zero difference from one frame to the next. However, it has been observed that there is often a constant "churning" of the pixels that make up the stationary background of a motion video clip in which the video camera is stationary. This can be attributed to at least two factors. First, the signal-to-noise ratio of the video signal digitizer results in least significant bit (LSB) fluctuation of the digitized signal. Second, automatic exposure settings such as the automatic gain control (AGC) built into every NTSC camera cause successive video frames to differ. Other exposure settings include, for example, gamma curves, color balance, automatic focus, automatic wipeout and fade, and zoom capabilities.
The LSB problem may be dealt with by calibrating a compression algorithm to ignore signal fluctuations below a specified noise rejection threshold. However, frame-to-frame pixel fluctuations caused by automatic exposure settings such as AGC still poses a problem to motion estimation techniques. Furthermore, the noise rejection threshold scheme for minimizing the LSB problem may be ineffective because of the fluctuations caused by automatic exposure settings. The reason for this problem is that successive video frames received by a video processor from a video camera may vary in overall exposure characteristics from frame to frame. With AGC, for example, the video camera automatically adjusts the gain of the CCD image sensor to produce the most image detail (best contrast balance) in any given situation. This is a continuous-feedback regulation mechanism which takes place at all times, even where the background and other image features are stationary. While such frame-by-frame automatic exposure settings are not very noticeable to a human viewer, the changes in pixels from frame to frame are very noticeable to a motion estimation technique, and indeed may be interpreted as "motion," thereby preventing blocks from being matched. Alternatively, because non-identical blocks are considered to be "matched" if their difference is below a certain threshold, a higher percentage of block matches that are made may have a higher error level, closer to the threshold. This can cause the reconstructed video frame to have a poorer quality than if better block matches, i.e. those with less difference error, had been found. Thus, although automatic exposure settings improve the quality of individual video frames in some contexts, such automatic exposure settings can reduce the ability of a video processor to efficiently perform motion estimation.
As a concrete example, video frame 1 and video frame 2 may each be nearly-identical, successive images of the face of a person wearing glasses. In video frame 2 the face may have tilted slightly to one side, so that the glasses now reflect the bright glare of a light. Normally, two such similar video frames might have many very similar blocks or features that could allow a high degree of motion estimation-type compression, since the block matching procedure will detect many similar blocks between the two frames. However, the glare in video frame 2 can cause many pixels in video frame 2 to be altered by the video camera before the video processor receives video frame 2.
For example, the glare may trigger the video camera's automatic gain control aspect of the automatic exposure settings to reduce the intensity of each pixel in video frame 2 to lower video frame 2's average brightness. Blocks within the two video frames that represent similar image features, such as eyes, ears, or portions thereof, might not be matched because of the different overall brightness between blocks. Thus, the number of blocks that will be matched is reduced even though many blocks contain almost identical image features, thereby hindering the ability to utilize motion estimation to compress data. Similarly, other automatic exposure adjustments can significantly interfere with the block matching operations performed in motion estimation, because similar features between video frames are more difficult to detect when successive video frames are generated with different exposure settings.
There is thus a need for a video processing system that obtains the advantages of automatic exposure settings without reducing the efficiency of motion estimation procedures.
It is accordingly an object of this invention to overcome the disadvantages and drawbacks of the known art and to provide a computer-implemented process and apparatus for performing motion estimation with an enhanced camera interface.
Further objects and advantages of this invention will become apparent from the detailed description of a preferred embodiment which follows.