1. Technical Field
The invention relates to the compression of video information for storage on and transmission among data processing systems. More particularly, the invention relates to a system and a method of matching patterns in image frames with library patterns within predetermined error tolerances.
2. Description of the Related Art
A video signal comprises a sequence of frames, which when displayed at a given minimum frame rate (e.g., 15 to 30 frames per second in a personal computer), simulate the appearance of motion to a human observer. In a personal computer system, each frame of the video image comprises a matrix of picture elements or "pixels." A common image matrix has 320 columns by 240 rows of pixels. A pixel is the minimum unit of the picture which may be assigned a luminance intensity, and in color video, a color. Depending upon the data format used, as many as three bytes of data can be used to define visual information for a pixel. A pixel by pixel color description of all pixels for an entire frame can require over two hundred thousand bytes of data.
To display a video segment, if such full frames were replaced at a frame rate of 30 frames per second, a computer could be required to recover from storage and write to video memory as many as 30 million bytes of data each second. Few contemporary mass data storage devices have both the bandwidth required to pass such quantities of data or the storage capacity to hold more than a few minutes worth of digital video information directly stored. As used here, bandwidth means the volume of data per unit time which can be recovered from an auxiliary storage device. Data compression is used to accommodate auxiliary storage devices in the storage and recovery of video segments for playback in real time and to reduce traffic on the system bus.
Data compression allows an image or video segment to be transmitted and stored in substantially fewer bytes of data than required for full frame reproduction. Data compression can be based on eliminating redundant information from frame to frame in a digitized video segment (temporal compression), or by eliminating redundant information from pixel to pixel in individual frames (spatial compression). In addition, compression may exploit superior human perception of luminance intensity detail over color detail by averaging color over a block of pixels while preserving luminance detail.
Frame differencing compression methods exploit the temporal redundancy that exists between digital video frames from the same scene recorded moments apart in time. This reduces the required data needed to encode each frame. Two successive frames from a sequence of digital video frames are compared region by region. The comparison process determines whether two corresponding regions are the same or different. The size and location of each region, the exact nature of the comparison and the definition of same and different in terms of the threshold supplied are outside the scope of this invention.
Necessarily, one frame represents a point in time after another frame. If two regions being compared are the same, then the pixels in the regions from frame N do not need to be encoded and stored if the pixels in a frame N-1 are already known. When two regions are different, the pixels in the later frame must be encoded and stored. When each region of two frames have been compared, encoded and stored, the process moves to the next pair of frames. During playback, the decompression process adds the stored information for each period to the current state of the display memory using a process that is the logical reverse of the encoding process. This is called conditional replenishment.
When there is very little temporal redundancy in a digital motion video the method fails. However, a motion video sequence of a flower growing, shot at 30 frames per second, will contain a great deal of redundancy and will compress well using conditional replenishment. Similarly a sequence recorded through a moving camera will contain little redundancy and not compress well, assuming motion compensation algorithms are not employed.
Still greater compression could be achieved, particularly for transmission, if the playback platform could access libraries of image portions to recreate an image. In such a system, the compressed data could carry a code calling on the playback platform to generate, say, a forest background. The location of the code in the compressed video stream would determine a selected set of pixel locations for the background. Regions of an image, such as forest backgrounds, brickwalls, or sand can be categorized by type and stored as combinations of color and luminance types in blocks of standardized patterns. However, lack of methods for rapid recognition of such patterns has hampered use of codes into library patterns in generating compressed video streams. Desirable is a rapid, non-memory intensive process for pattern recognition.
Were no error allowed, the comparison of a binary pattern from an image with binary pattern in a table of patterns would be straight forward. The process would loop through the table executing a comparison of patterns until a match was found or where the table was exhausted. Upon locating a match, the elementary unit is represented by an offset into the table. However, pattern matching with error tolerance is more difficult to implement. One implementation approach would be to compare the binary pattern from an elementary unit with each table entry with an exclusive OR operation. The number of 1 bits in the output of the exclusive OR operation determines the number of pixel mismatches between patterns. This output would be scanned to determine the number of pixel mismatches from the 1 bits. However, contemporary microprocessors generally lack in instruction that does the exclusive OR operation and counts the number of mismatches. Thus the process must be written to shift bits out of the resulting comparison from the exclusive OR operation one at a time. A count must be incremented when the bit shifted out has a 1 value. Such a sequence of instructions (i.e. shift, test and conditionally increment) must loop until too many bits have been tested in error, or the entire word has been tested. In contemporary machines, such a video compression process would represent an unacceptably large processing load.
Alternatively, computational burdens could be reduced by using a direct lookup table with the pattern from the elementary unit being used as an offset into the lookup table. For a four by four rectangular region this would require at lookup table of 64 kilobytes. Some specialized video hardware based upon digital signal processors does not have sufficient random access memory to support such a relatively large lookup table.