It is generally known that image compression is effective in reducing the amount of image data for transmission or storage. In particular, with the introduction of scalable image coding formats like JPEG2000, it has become possible to send and receive only a fraction of the image file and still reconstruct a high-quality image at the receiving end. This is a desirable feature, because the size of a compressed digital image stored on one device must, in many cases, be further reduced in order for that image to be transmitted to or be displayed on a different device. However, many current digital imaging systems create and maintain content in the JPEG format, which uses a discrete cosine transform (DCT) block-based compression scheme. Unlike JPEG2000, if part of the image file corresponding to a JPEG image is omitted, the image becomes corrupted and the quality generally degrades to such an extent that the image is useless. Thus the JPEG format does not “scale” in terms of image file size.
To illustrate how the size of a previously compressed digital image stored on one device may need to be reduced in order for it to be stored or displayed on a different device, consider, for example, a large, high-quality digital image stored on a server. Such an image may exceed the memory limitations of a mobile device. In order for the mobile device to store and display the image, it would be necessary to reduce the size of the previously compressed image.
Continuing the foregoing example, if memory capacity were the only limitation, it would be possible to devise an algorithm to reduce the image size as it is received by the mobile device (prior to storage), rather than having the sender reduce the size prior to transmission. However, in reality, other limitations also apply. For example, some protocols may limit the maximum message size that can be transmitted to the mobile device, implying that the image size must be reduced prior to (not after) transmission. Additionally, reducing image size on the receiving end may waste significant bandwidth, resulting in cost inefficiencies.
If such a digital image transmission system were to operate on an automated basis, where it needed to reduce the size of many digital images per second without intervention of a human being, then in order for a system to process a specified number of images per second, the processing capability would be directly influenced by the efficiency of the processing operation. That is, if image size can be reduced in an efficient manner, less computational power would be required to meet the processing goal. Thus, there is a clear economic relationship between the time taken to reduce the size of an image and the cost of doing so. Such a relationship exists for multimedia messaging services, motivating the need for an efficient size-reduction method. In this specification, “size” means the number of bytes utilized by the compressed image. Thus, a “large” image is one that occupies many bytes in a storage space.
In the past, size reduction has been carried out using a number of approaches. These approaches generally possess one or more of the following characteristics:
a) the image is recompressed several times as size reduction is carried out in an iterative fashion;
b) the original (uncompressed) image data is assumed to be available when the image is resized;
c) the image, if already compressed, is fully decompressed prior to size reduction, and the resized image is recompressed for storage or transmission;
d) the image quality is unduly sacrificed in exchange for efficiency, resulting in a visually unacceptable product.
The iterative method is inefficient because it uses a trial-and-error approach that does not make use of the information contained in the image. Such a brute-force method is illustrated in FIG. 1. As shown in FIG. 1, the input image is decoded to obtain pixel values, and the quality needed to produce an image of the target size is visually judged. The image is re-compressed with the estimated quality. If the size of the re-compressed image is too large or too small as compared to the target size, then the quality is adjusted and the image is again re-compressed until a valid quality scaling factor is found. It should be noted that, when performing size reduction, each non-zero pixel value must be multiplied by the quality scaling factor (QSF) in a floating-point operation. Because of the computational complexity of the floating-point operation, a significant portion of the overall time to reduce the image size is spent in the final encoding phase.
Because JPEG is a variable-length scheme, the ultimate size of the compressed image depends not only upon the quality setting, but also upon the image content or the characteristics of the image itself. In other words, two different images with the same resolution and same quality setting may be compressed into different sizes. Thus, no clear relationship between image size and quality setting can be pre-defined: it varies according to the image content and may only be estimated statistically. In many cases such a statistical estimate is sufficiently accurate and may be the only option. For example, the remaining capacity on a storage card in a digital camera is estimated in such a fashion. The brute-force method does not take advantage of the availability of the information that can be extracted from an image.
A more “intelligent” approach is disclosed in Farkash et al. (U.S. Pat. No. 5,594,554, hereafter referred to as Farkash) and Yovanof et al. (U.S. Pat. No. 5,677,689, hereafter referred to as Yovanof), wherein certain characteristics of the image in question are used when determining the relationship between image quality and image size. In Yavanof, an activity metric reflecting the complexity of the input image is computed from the image data after the image is transformed using a Discrete Cosine Transform (DCT) and quantized using a predetermined Q-factor. Based on the activity metric, a new Q-factor is used to adjust the quantization coefficients on the partially JPEG compressed image. In Farkash, the activity of an image is computed in a statistical first pass prior to the actual compression pass, and a scale factor for quantization is computed based on the activity. While the approaches that use the activity metric of the image as disclosed in Yavanof and Farkash are useful, they deal with the issue of estimating file size for encoding an original image. This means that the original image data is assumed to be available when the image is resized. However, original images in most cases are not available. JPEG is generally used as a “lossy” format, meaning that an image that is first encoded and then decoded will not be identical to the original, although the differences may not be visually discernible. Consequently, any method that relies on the original image may not be useful.
In order to effectively use bandwidth on the Internet, Mogul et al. (U.S. Pat. No. 6,243,761, hereafter referred to as Mogul) discloses a size reduction method, wherein the original image is not required. Mogul treats image size reduction as a small component of a much larger “content adaptation” system. Mogul suggests that the input image be fully decompressed and then re-compressed. This approach is inherently inefficient, because it unnecessarily duplicates all the steps that have been used to process the input image.
Ratnakar et al. (U.S. Pat. No. 6,243,761, hereafter referred to as Ratnakar) discloses a method of image size reduction, wherein statistics about a compressed image are gathered after the image is partially decompressed, and a threshold is used to reduce the size of the image to suit a bandwidth. In Ratnakar, the DCT coefficient values below a certain magnitude are removed (set to zero) but the other DCT coefficient values are not modified. When using such a threshold to discard certain coefficients, the resulting image usually lacks granularity. While the approach disclosed in Ratnakar results in the size target “being beaten” with reduced computational complexity, the quality of the image is unduly sacrificed. Furthermore, the size of the reduced image usually cannot be estimated before the resizing process is actually carried out.
TABLE IImage sizeQSFreduced by1.000%0.5014%0.4947%0.2552%0.2464%0.1380%0.00100%
Applying a threshold to coefficients, as disclosed in Ratnakar, involves selectively removing some DCT coefficients and leaving the remainder unmodified. An alternative is to scale each coefficient value by some constant, which can be called the quality scaling factor (QSF).
There is a certain relationship between how much an image is to be reduced in size and the required QSF for the statistical model of a “typical image”. When a number of quality scaling factors are used on a plurality of different images to determine the reduction percentage, the relationship between the reduction percentage and QSF of a “typical image” can be found. Such a relationship is shown in Table I and FIG. 2. As shown, the behavior involves a number of discontinuity steps. Discontinuities are due to the operation being performed on a previously quantized image; in contrast, the same plot for an uncompressed image would involve a smooth curve (i.e., without the discontinuities). The difference, plus the fact that few images fit this “typical” curve exactly, implies that it is almost impossible to develop a sufficiently accurate lookup table and use it to determine a QSF. If one relies on such a behavior to determine a target size based on a selected QSF, the actual reduction is likely to differ by 5 to 10 percent from the predicted reduction. Likewise, when using a lookup table to determine a QSF from a target reduction, one is likely to produce an image of sub-optimal quality. This can be illustrated by the following example, where the target size reduction is 20 percent. A size reduction of 1-14 percent usually corresponds to a QSF of between 0.5 and 1.0. According to Table I, the required QSF must be smaller than 0.5 because 20 percent reduction is more reduction than 14 percent, which corresponds with QSF=0.5. However, because of the discontinuity around QSF=0.50, a QSF of marginally under 0.5 will result in a 50 percent reduction in the image size. The actual reduction of 50 percent is far greater than the target reduction of 20 percent. Accordingly, if our goal is to reduce an original image of 15 KB to fit a certain display of 12 KB, we end up having a 7.5 KB image. If the image did not conform to the “typical image” behavior, a QSF of 0.5 may in fact meet the target of 12 KB exactly, meaning that using the QSF based solely upon a lookup table would have reduced the size more than required, i.e. to 7.5 KB instead of 12 KB. As size is not proportional to perceptual quality, this image is likely to look much worse than necessary. Similarly, in order to reduce an original image of 20 KB to a reduced image of 10 KB, we need a target reduction of 50 percent. According to Table I, a QSF of approximately 0.3 should be selected. However, because of the margin of error (i.e., the difference between an actual image and a “typical image”), this may result in only a 40-45 percent reduction, and the size of the resulting image is between 11 KB and 12 KB. Thus, the actual size is larger than the target size. This presents a serious challenge. Not only must a new QSF be calculated, but the image must be re-compressed one or more times.
It is desirable to provide an efficient method of reducing the size of an image, wherein the image is previously compressed and the original image is not available, where the method utilizes information particular to the image being reduced to aid in the reduction process.