1. Field of the Invention
This invention relates to image processing.
2. Related Technology
ANSI Standard C “memcpy” Function
A given computer hardware architecture will have an optimal means of copying a block of data from one location in a memory to another location. Complex Instruction Set Computing (CISC) architectures implement instructions that over a number of CPU cycles move a block of data. Reduced Instruction Set Computing (RISC) architectures optimize the instruction set to process each instruction in one or two CPU cycles but also included instructions that can be used to implement a short routine that will accomplish the block move in an optimal manner. An efficient routine for copying a block of data can be implemented for each specific computer architecture.
Some computer architectures include Direct Memory Access (DMA) circuitry that transfers data between memory and input/output (I/O) devices without continual central processing unit (CPU) intervention.
The ANSI standard for the C Programming Language defines a “memcpy” library function as an interface to an efficient routine for copying a block of bytes to another location.
Graphical Images
A television screen has a 4:3 aspect ratio. In the United States, television signals contain 525 scan lines of which 480 lines are visible on most televisions. When an analog video signal is digitized, each of the 480 lines are sampled 640 times, and each sample is represented by a number. Each sample point is called a picture element, or pixel. A two dimensional array is created that is 640 pixels wide and 480 pixels high. This 640×480 pixel array is a still graphical image that is considered to be full frame. The human eye can optimally perceive approximately 16.7 thousand colors. A pixel value comprised of 24 bits can represent each perceivable color. A graphical image made up of 24-bit pixels is considered to be full color. A standard Super VGA (SVGA) computer display has a screen resolution of 640 by 480 pixel. Twenty-four bits is three bytes. It is common to use a fourth byte for each pixel to specify a mask value or alpha channel. A typical image being processed may contain over 1.2 million bytes of data.
When digitizing a video signal, or when manipulating the graphics to be output as a video signal or to be displayed on a computer display it may be necessary to copy the image data to another area of memory (a buffer) for some type of image processing. However, the copied buffer takes up significant memory resources. Also the time it takes to copy the image can be significant especially when the image processing must be done in real time. Those skilled in the art realize that to improve processing performance the number of memory buffers containing a copy of the same data should be reduced to the minimum set possible.
Display Video RAM
The memory of a computer system may be physically implemented in different areas or on different boards. The main memory is used for storage of program instructions and data. A special memory area called “video RAM” may be dedicated to storing the image that is to be displayed on the computer display. The video RAM has special hardware that allows it to be accessed to update the display over 60 times a second.
Capture Video RAM
A video digitizer or video capture card may also contain a special memory area similar to display video RAM for capturing the digital samples from the video signal. This RAM may also have special hardware that allows it to be updated 60 times a second.
Cache Memory
Many computer architectures implement one or more levels of memory caching whereby blocks of memory data are stored in a cache memory that may be accessed more rapidly by the CPU. Typically input and output (I/O) memories such as video RAM, capture RAM, or hard disk buffers are not cached.
3. Description of Prior Art
In the last few years, there have been tremendous advances in the speed of computer processors and in the availability of bandwidth of worldwide computer networks such as the Internet. These advances have led to a point where businesses and households now commonly have both the computing power and network connectivity necessary to have point-to-point digital communications of audio, rich graphical images, and video. However the transmission of video signals with the full resolution and quality of television is still out of reach. In order to achieve an acceptable level of video quality, the video signal must be compressed significantly without losing either spatial or temporal quality.
A number of different approaches have been taken but each has resulted in less than acceptable results. These approaches and their disadvantages are disclosed by Mark Nelson in a book entitled The Data Compression Book. Second Edition, published by M&T Book in 1996. Mark Morrision also discusses the state of the art in a book entitled The Magic of Image Processing, published by Sams Publishing in 1993.
Video Signals
Standard video signals are analog in nature. In the United States, television signals contain 525 scan lines of which 480 lines are visible on most televisions. The video signal represents a continuous stream of still images, also known as frames, which are fully scanned, transmitted and displayed at a rate of 30 frames per second. This frame rate is considered full motion.
A television screen has a 4:3 aspect ratio.
When an analog video signal is digitized, each of the 480 lines is sampled 640 times, and each sample is represented by a number. Each sample point is called a picture element, or pixel. A two dimensional array is created that is 640 pixels wide and 480 pixels high. This 640×480 pixel array is a still graphical image that is considered to be full frame. The human eye can perceive 16.7 thousand colors. A pixel value comprised of 24 bits can represent each perceivable color. A graphical image made up of 24-bit pixels is considered to be full color. A single, second-long, full frame, full color video requires over 220 millions bits of data.
The transmission of 640×480 pixels×24 bits per pixel times 30 frames requires the transmission of 221,184,000 million bits per second. A T1 Internet connection can transfer up to 1.54 million bits per second. A high-speed (56 Kb) modem can transfer data at a maximum rate of 56 thousand bits per second. The transfer of full motion, full frame, full color digital video over a T1 Internet connection, or 56 Kb modem, will require an effective data compression of over 144:1, or 3949:1, respectively.
A video signal typically will contain some signal noise. In the case where the image is generated based on sampled data, such as an ultrasound machine, there is often noise and artificial spikes in the signal. A video signal recorded on magnetic tape may have fluctuations due the irregularities in the recording media. Florescent or improper lighting may cause a solid background to flicker or appear grainy. Such noise exists in the real world but may reduce the quality of the perceived image and lower the compression ratio that could be achieved by conventional methods.
Basic Run-Length Encoding
An early technique for data compression is run-length encoding where a repeated series of items are replaced with one sample item and a count for the number of times the sample repeats. Prior art shows run-length encoding of both individual bits and bytes. These simple approaches by themselves have failed to achieve the necessary compression ratios.
Variable Length Encoding
In the late 1940s, Claude Shannon at Bell Labs and R. M. Fano at MIT pioneered the field of data compression. Their work resulted in a technique of using variable length codes where codes with low probabilities have more bits, and codes with higher probabilities have fewer bits. This approach requires multiple passes through the data to determine code probability and then to encode the data. This approach also has failed to achieve the necessary compression ratios.
D. A. Huffman disclosed a more efficient approach of variable length encoding known as Huffman coding in a paper entitled “A Method for Construction of Minimum Redundancy Codes,” published in 1952. This approach also has failed to achieve the necessary compression ratios.
Arithmetic, Finite Context, and Adaptive Coding
In the 1980s, arithmetic, finite coding, and adaptive coding have provided a slight improvement over the earlier methods. These approaches require extensive computer processing and have failed to achieve the necessary compression ratios.
Dictionary-Based Compression
Dictionary-based compression uses a completely different method to compress data. Variable length strings of symbols are encoded as single tokens. The tokens form an index to a dictionary. In 1977, Abraham Lempel and Jacob Ziv published a paper entitled, “A Universal Algorithm for Sequential Data Compression” in IEEE Transactions on Information Theory, which disclosed a compression technique commonly known as LZ77. The same authors published a 1978 sequel entitled, “Compression of Individual Sequences via Variable-Rate Coding,” which disclosed a compression technique commonly known as LZ78 (see U.S. Pat. No. 4,464,650). Terry Welch published an article entitled, “A Technique for High-Performance Data Compression,” in the June 1984 issue of IEEE Computer, which disclosed an algorithm commonly known as LZW, which is the basis for the GIF algorithm (see U.S. Pat. Nos. 4,558,302, 4,814,746, and 4,876,541). In 1989, Stack Electronics implemented a LZ77 based method called QIC-122 (see U.S. Pat. Nos. 5,532,694, 5,506,580, and 5,463,390).
These lossless (method where no data is lost) compression methods can achieve up to 10:1 compression ratios on graphic images typical of a video image. While these dictionary-based algorithms are popular, these approaches require extensive computer processing and have failed to achieve the necessary compression ratios.
JPEG and MPEG
Graphical images have an advantage over conventional computer data files: they can be slightly modified during the compression/decompression cycle without affecting the perceived quality on the part of the viewer. By allowing some loss of data, compression ratios of 25:1 have been achieved without major degradation of the perceived image. The Joint Photographic Experts Group (JPEG) has developed a standard for graphical image compression. The JPEG lossy (method where some data is lost) compression algorithm first divides the color image into three color planes and divides each plane into 8 by 8 blocks, and then the algorithm operates in three successive stages:                (a) A mathematical transformation known as Discrete Cosine Transform (DCT) takes a set of points from the spatial domain and transforms them into an identical representation in the frequency domain.        (b) A lossy quantization is performed using a quantization matrix to reduce the precision of the coefficients.        (c) The zero values are encoded in a zig-zag sequence (see Nelson, pp. 341-342).        
JPEG can be scaled to perform higher compression ratio by allowing more loss in the quantization stage of the compression. However this loss results in certain blocks of the image being compressed such that areas of the image have a blocky appearance and the edges of the 8 by 8 blocks become apparent because they no longer match the colors of their adjacent blocks. Another disadvantage of JPEG is smearing. The true edges in an image get blurred due to the lossy compression method.
The Moving Pictures Expert Group (MPEG) uses a combination of JPEG based techniques combined with forward and reverse temporal differencing. MPEG compares adjacent frames and, for those blocks that are identical to those in a previous or subsequent frame, only a description of the previous or subsequent identical block is encoded. MPEG suffers from the same blocking and smearing problems as JPEG.
These approaches require extensive computer processing and have failed to achieve the necessary compression ratios without unacceptable loss of image quality and artificially induced distortion.
QuickTime: CinePak, Sorensen, H.263
Apple Computer, Inc. released a component architecture for digital video compression and decompression, named QuickTime. Any number of methods can be encoded into a QuickTime compressor/decompressor (codec). Some popular codec are CinePak, Sorensen, and H.263. CinePak and Sorensen both require extensive computer processing to prepare a digital video sequence for playback in real time; neither can be used for live compression. H.263 compresses in real time but does so by sacrificing image quality resulting in severe blocking and smearing.
Fractal and Wavelet Compression
Extremely high compression ratios are achievable with fractal and wavelet compression algorithms. These approaches require extensive computer processing and generally cannot be completed in real time.
Sub-Sampling
Sub-sampling is the selection of a subset of data from a larger set of data. For example, when every other pixel of every other row of a video image is selected, the resulting image has half the width and half the height. This is image sub-sampling. Other types of sub-sampling include frame sub-sampling, area sub-sampling, and bit-wise sub-sampling.
Image Stretching
If an image is to be enlarged but maintain the same number of pixels per inch, data must be filled in for the new pixels that are added. Various methods of stretching an image and filling in the new pixels to maintain image consistency are known in the art. Some methods known in the art are dithering (using adjacent colors that appear to be blended color), and error diffusion, “nearest neighbor”, bilinear and bicubic.
Doppler Enhancement
Doppler techniques are used to determine the velocities of one or more small objects. Some common uses of Doppler techniques include without limitation:                1. Radar used to detect rain        2. Radar used to determine speed of vehicles or aircraft        3. Ultrasound blood flow analysis        
Doppler velocity scales are often incorporated with grayscale images.
In the case of ultrasound blood flow analysis, average velocities toward the sensing probe are encoded as a shade of red and velocities away from the sensing probe are encoded as a shade of blue. Although the image appears to be in color, there are really three monochromic values: a grayscale, a red scale, and a blue scale. The base image plane (grayscale ultrasound) is generated more often (typically 15-30 frames per second) than the overlay plane showing the Doppler red and blue scales (typically 3-10 frames per second).
In the case of rain, the base map of the earth is generated only once and the Doppler colors that indicate the intensity of the precipitation are laid over the base map.
Moving Pictures
A video or movie is comprised of a series of still images that, when displayed in sequence, appear to the human eye as a live motion image. Each still image is called a frame. Television in the USA displays frames at the rate of 30 frames per second. Theater motion pictures are displayed at 24 frames per second. Cartoon animation is typically displayed at 8-12 frames per second.
Compression Methods
The ZLN and ZLD methods are effective ways to compress video images. Other compression algorithms are known in the prior art, including RLE, GIF (LZW), MPEG, Cinepak, Motion-JPEG, Sorensen, Fractal, and many others.
Each of these methods treats a frame of video as a basic unit of compression applying the compression method uniformly to the entire image.
Color Plane Separation
It is well known in the art that an image can be uniformly separated into color planes based on the red, green, and blue components values for each pixel, based on hue, saturation, and brightness component values for each pixel, or based on ink colors, such as cyan, yellow, magenta, and black. However these color plane separations are not done to reduce data size or to aid compression. They are used to facilitate the display (such as on a RGB or YUV computer monitor) or the printing of the image (for example, four-color printing).
Frame Differencing
MPEG and some other compression methods compare adjacent frames in a stream of frames. Under certain circumstances these methods send only a subset of a frame (namely a rectangular portion that contains a change when compared to the adjacent frame) which is then overlaid on the unchanged data for the adjacent frame.