A digital image in uncompressed form comprises an array of image pixels or picture elements. Each pixel, in turn, is represented by a certain number of bits, which carry information about the brightness (luminance) and/or color (chrominance) of the pixel. Different schemes exist for representing the luminance and/or chrominance of pixels in a digital image. Commonly, a so-called YUV color model is used. The luminance, or Y, component represents the luminance of the pixel, while the color of the pixel is represented by two chrominance or color difference components, labeled U and V. Other color models, such as RGB (Red, Green, Blue) color models, which are based on components representing the three primary colors of light, are also commonly used.
Many systems can encode images, producing a compressed version of the image (for storing the image on a storage media or for communicating the image to another system) in a variety of compressed formats. Each format possesses different intrinsic characteristics, which suit it to particular types of images. For example, two common formats currently in use are GIF (Graphic Interchange Format) and JPEG (Joint Photographic Expert Group) format. GIF is well suited to storing computer-generated graphics, which may contain rasterized text, regions of solid color, and sharp lines. By contrast, the JPEG format is well suited to encoding natural scenes or real world objects, such as provided by photographs of real world scenes or real world objects.
In compression according to the baseline mode of the JPEG standard, an image to be coded/encoded is first divided into an array of non-overlapping square blocks, each block comprising, for example, an 8×8 array of image pixels. In the case of the JPEG baseline, a two-dimensional Discrete Cosine Transform (DCT) is then applied independently to each of the image blocks. This has the effect of converting the image data from the pixel value domain to the spatial frequency domain and to produce a corresponding set of coefficient values, each of which is a weighting factor for a basis function of the two-dimensional DCT. The coefficient values thus produced are quantized and then coded in a lossless manner using entropy coding to further reduce the amount of data (i.e. number of bits) required for their representation. According to the JPEG baseline, the entropy coder employs only Huffman coding to produce a compressed bit-stream, although in other modes arithmetic coding may alternatively be used. Finally, data describing image and coding parameters (e.g. type of compression, quantization and coding tables, image size, etc.) is embedded in the bit-stream produced by the entropy encoder. Embedding the data about the coding is necessary because the JPEG standard comprises four alternative coding modes and places few constraints on the quantization and coding tables that can be used, and so for a JPEG compressed bit-stream to be communicated to a receiving platform and for the image to be reconstructed without any ambiguity by the receiving platform, the receiving platform must know which of the different coding modes was used.
The size of a JPEG file does not depend upon the number of colors in the image provided by the file; rather it depends on the frequency composition of the image, i.e. whether the image includes slow, subtle changes (such as background tones), which have a low frequency, or sharper edge-like changes, which have a high frequency. For example, a photograph of the sky would not have a large high frequency component because it is mostly solid blue. A photograph of a tree would have much more high frequency component because of the edges of the leaves. A graphic (artificial) image often has very sharp edges and thus lots of high frequency component. JPEG is designed to favor low frequency and eliminate most high frequency, because in natural images, the human eye cannot detect high frequency very well. As a result, sharp edges tend to become blurred when a GIF image is encoded as JPEG. Because low frequency components are favored, a photograph of the sky will have a smaller JPEG file size than a photograph of a tree, even if both files might represent the same number of pixels. Additionally, the JPEG file size depends upon the number of pixels in the original image. The relationship, assuming frequency content is static, is proportional. Thus if two images have the same frequency content and one has half as many pixels, the output file size for the one will be half that for the other.
GIF on the other hand does not depend at all upon the frequency content. It depends upon two things: the number of colors in the image, and the “pattern” in which those colors are arranged. Specifically, GIF looks for patterns in the sequence of palette indexes which correspond to the colors of each pixel in the image, stores the patterns in a dictionary, and encodes these patterns rather than the individual indexes. Because of this, GIF is called a dictionary-based scheme. (The term “index” as used here refers to the so-called “palette index” used in GIF, unless otherwise stated. In GIF, the palette index operates in conjunction with a lookup table of colors, called the palette; the palette index for a pixel is used to look up in the palette what color to use for the pixel, i.e. what RGB value to use for the pixel. The palette index serves to indicate characteristics of an image at a particular location in the image, i.e. for a particular pixel of the image.)
With this approach to encoding an image, the size of a GIF image depends largely upon how many indexes are represented by one pattern in the dictionary, and how many times an encountered pattern is found to occur in the dictionary. If there is a large solid region, and one dictionary element can represent many pixels, the compression will be very efficient. By contrast, if there is no real pattern in the image and each dictionary entry can only represent one or two pixels, the compression will not save very much. Since natural images contain varying tones, there are no dominant patterns for such images and so GIF compression is not usually efficient for natural images. In essence, the size of a GIF image depends partly upon the number of colors, and also upon how those colors are arranged in the image.
So, if a GIF image of a natural scene is re-encoded as JPEG, the file size will almost always decrease, and if a GIF image of an artificial scene is re-encoded as JPEG, the file size will often increase unless somehow constrained from doing so. Similarly, if a JPEG image of a natural scene is re-encoded as a GIF, the file size will increase unless it is somehow constrained from doing so. In the final permutation, when an arbitrary JPEG image of an artificial scene is to be re-encoded as a GIF file, the file size cannot be predicted because much depends upon the quality of the initially encoded JPEG image (e.g. how blurred it has become).
Thus, encoding an image in a format less suited to the type of image (natural vs. graphic) can produce undesirable results. For example, encoding a photograph as a GIF image may result in a grainy image having a much larger file size than would be the case if JPEG format were used. Some systems are designed to handle one particular type of image (e.g. digital cameras are designed to store photographs of real world scenes or real world objects) so the choice of which encoding format to use can be reliably predetermined. Other systems, such as personal computers, allow an end user to decide on an encoding format for an image. Still other systems must handle a variety of image types in situations where user input to decide what format to use is either undesirable or impractical. For example, a system that processes a sequence of image types, and is expected to quickly decide on which format to use, is preferably an automated system rather than one that relies on input from an end user, since an end user might take too long to decide or might not be able to be relied on to make the best decision in a high enough percentage of cases. An example of such a system in which the decision is preferably automated is an image conversion server, which is a system that accepts incoming images and converts various properties (resolution, size and bytes, and other properties) to meet predetermined target requirements. Because the content of incoming images may vary widely, and because many images need to be processed quickly, an automated method of selecting the appropriate encoding format is desirable.
So what is needed is an automated system for choosing, from among a predetermined set of formats, a suitable format for encoding or re-encoding an image, an automated system that preferably is able to process a sequence of images so as to determine a suitable encoding format for each in a time acceptable for real world applications.