Video compression methods are employed to reduce the number of bits needed to transmit and store a digital video signal. As a result, a lower bandwidth communication channel can be employed to transmit a compressed video signal in comparison to an uncompressed video signal. Similarly, a reduced capacity of a storage device, which can comprise a memory or a magnetic storage medium, is required for storing the compressed video signal. A general video compression method includes encoding, which converts the video signal into a compressed signal, and decoding, which reconstructs the video signal based upon the compressed signal.
A video signal is a sequence of image frames that are created when a scanning system captures an image at periodic intervals of time. Each frame of the sequence is a function of two spatial variables x and y and a temporal variable t. A frame consists of an array of digital picture elements or pixels located along the horizontal spatial variable x and the vertical spacial variable y. As examples, a frame may consist of an array of 512.times.480 pixels, 176.times.144 pixels, 720.times.576 pixels, or the like.
To the observer, the pixels form images or objects in the frame. Each of the objects has an outer edge which is referred to as the contour of the object. The contour of an object separates that object from other objects in the frame.
Each pixel is represented as a digital code word having a given number of bits. A bit can have a value of zero or one. Many facsimile and printing devices have one bit allocated to each pixel. If the bit has a value of one, then ink is applied to the pixel location. If the bit has a value of zero, then no ink is applied to the pixel location. Thus, these devices are known as bi-level.
Many applications such as multimedia and the like use digital code words having more than one bit allocated to each pixel. Typically, eight to thirty-two bit digital code words may be used to represent a pixel.
Thus, an eight bit code word can have a value, in base 10 numerals, ranging from zero to 255. Consequently, each pixel can have many more color shades or brightness levels than the bi-level devices because there is more information available in the eight bit code word.
For instance, a gray scale image frame consists of pixels having several defined levels of brightness with the neutral color gray. In essence, assuming that each pixel in a gray scale image frame is represented by an eight bit code word, the level of brightness of a pixel is scaled from a value of zero to 255. Similar to bi-level devices, a pixel is black if the code word has a value of zero and is white if the code word has a value of 255. Unlike the bi-level devices, a pixel in a gray scale image frame can embody a variety of shades. If the code word has a value close to 255 then the pixel is light gray or if the value is close to zero then the pixel is dark gray. Finally, if the code word has a value of 127 then the pixel is true gray. Thus, a gray scale image frame is known as a multilevel continuous-tone image frame.
A gray scale image frame must be compressed to reduce the transmission time and meet stringent memory limitations for virtually all applications. For instance, an image frame consisting of an array of 720.times.576 pixels with each pixel represented by an eight bit code word would take about 52 seconds to transmit over a 64 Kbit/second transmission line. The amount of memory stored would be over 3.3 Mbits. The transmission rate and the amount of memory needed are much too high for bandwidth and memory storage constraints of a typical application.
A gray scale alpha mask is a code or an instruction that identifies the transparency of each pixel of the image frame. A gray scale alpha mask is structurally the same as a gray scale image frame. But while a gray scale image frame is meant to be viewed as an image, a gray scale alpha mask is a code for each pixel that identifies its transparency. For the sake of simplicity, we will refer to gray scale alpha masks as gray scale image frames and the alpha values as shades of gray. For a large percentage of gray scale alpha masks, the textures are relatively simple. The texture of a gray scale alpha mask refers to the alpha values of a group of pixels in a portion of the frame or all the pixels in the whole frame. Alpha refers to the value of the digital code word representing the transparency of a pixel.
One example of a simple texture is a gray scale alpha mask consisting of pixels having a constant gray shade. Thus, each pixel in this alpha mask is represented by the same digital code word having a given value. For instance, the value of 127 meaning that each pixel in the frame has a true gray color or equal transparency.
Another example of a simple texture is a gray scale alpha mask that consists of a binary alpha mask with the code word value of the pixels around the edges of the binary alpha mask tapered from 255 to zero to provide a smooth compositing with the background. A binary alpha mask refers to an image having pixels which are represented by code words having a value of zero or 255. Thus, alpha for a binary alpha mask is either zero or 255. Because the binary alpha mask has pixels which can only have two discrete values, the code words representing the pixels of the binary alpha mask can be encoded and decoded as if they were a one bit code word. Thus, a binary image frame is similar to the image frames generated by the bi-level devices discussed earlier.
Two of the main functions of gray scale alpha masks are to feather object edges for smoother compositing and for fades in or out of an object in a video sequence. Feathering refers to a blurring or anti-aliasing of the boundaries of an object to provide a smooth visual transition between objects in the image frame when they are composited together. Fades in or out refer to having an object materialize into a scene slowly through a number of frames of the video sequence by having its pixels transition from completely transparent to opaque or vice versa. Another common function of gray scale alpha masks is for providing the illusion of depth to an object in the image frame by having it slowly disappear in the distance. This spatial fade or dissolve effect is achieved by having the gray scale alpha values of an object in a single frame transition as a function of their pixel position. For example, the alpha values of the pixels may decrease as a function of their horizontal or vertical position.
Most current compression algorithms achieve bit reduction by compressing the gray scale image frame as sequences of blocks or pixels such as the Joint Photographic Experts Group (JPEG) standard and quadtree decomposition algorithm. Unfortunately, neither of these two methods represents the image frames as compactly as possible because neither of them recognizes the functionality achieved by utilizing gray scale alpha masks and instead try to represent the original frame as closely as possible and not just capture the functionality embodied in the original data. Accordingly, these algorithms require a significant number of bits to achieve the gray scale alpha mask functionality. As a result, for many bandwidth constrained applications these methods are unacceptable.
Another recent technique is to decompose a gray scale alpha mask into a binary alpha mask and a texture. The binary alpha mask is encoded with any algorithm applicable to binary alpha masks. The texture is encoded by transforming it with the Discrete Cosine Transformation (DCT) and then quantizing the resulting coefficients. This technique also strives to efficiently represent the original data and not just capture the functionality of the original data. A primary disadvantage with the steps of transforming the texture with the DCT and then quantizing the coefficients is that the bandwidth requirements are significantly greater than with the present invention.