This invention relates to video signal processing and more particularly to a method and system for re-sizing images and zooming into and out of images in a way that makes use of non-spatial interpretations of the images.
Images can be imported into digital systems using opto-electronic transducing devices, such as video cameras and computer scanning devices. Also, images can be created within digital systems using a variety of computer software programs, such as drawing and animation programs. Images are commonly represented and stored in digital systems as arrays of digital numbers.
Images are typically displayed on a suitable display device, such as the cathode ray tube (xe2x80x9cCRTxe2x80x9d) of a television or computer monitor, as a two-dimensional spatial representation on the surface of the display device. A series of slightly different images can be displayed in rapid temporal sequence in order to create the perception of smooth motion as, for example, in the case of television image sequences.
Pixels are relatively small localized fixed regions on the surface of the image display device. Displayed images are often composed of many thousands of pixels, wherein each pixel has attributes of size, shape, intensity and color. Objects within displayed images are represented by groups of pixels. Generally, the pixels are sufficiently close to each other on the surface of the display device to ensure that when the displayed image is viewed from a sufficient distance objects are perceived to have the same characteristics of shape, texture, edges and shading as similar objects in the real world.
Pixels are often, but not always, arranged on the surface of the display in the form of a rectangular grid consisting of uniform rows and columns of pixels. The horizontal and vertical spatial resolutions of an image are determined by the average number of pixels per unit of distance in the horizontal direction along the rows of pixels and in the vertical direction along the columns of pixels. The spatial resolution determines the minimum distance at which an image must be viewed so that the human vision system (xe2x80x9cHVSxe2x80x9d) perceives objects within the image and not the individual pixels.
The spatial resolution of a displayed image can be increased by increasing the horizontal resolution and the vertical resolution. A high resolution image will yield a more natural looking approximation of an original real world image than a low resolution image. The quality of the image results from the HVS""s perception of the lines and edges within displayed images. In low resolution images, the lines and edges are often perceived to be jagged because of the staircase effect caused by rectangular-shaped pixels in the displayed image.
The perception of jagged edges and lines, due to the staircase effect, is less discernible in high resolution images because of the reduced size of the rectangular-shaped pixels in the displayed image.
The finite resolution of the displayed image is a major factor limiting the faithfulness by which objects may be represented.
A digital image is a set of numbers where each number corresponds to a pixel of the displayed image. For example, a displayed image might consist of 512 by 640 pixels where each pixel is characterized by a range of possible luminous intensities and colors. If we decompose the color of each pixel into its primary colors of red (xe2x80x9cRxe2x80x9d), blue (xe2x80x9cBxe2x80x9d) and green (xe2x80x9cGxe2x80x9d), then the displayed image may be numerically represented as the combination of the R, B and G component images. Each of the R, B and G component images is a monochromatic image in which each pixel is characterized only by a digital number representing the luminous intensity of the pixel. A displayed image, consisting of a rectangular array of pixels, may therefore be represented as three monochromatic component images, each of which may be represented in digital form as a corresponding rectangular array of numbers. We refer to such a rectangular array of numbers as a digital image where it is understood that it may represent a monochromatic component of a displayed color image or some suitable combination of the monochromatic components.
An image display device has a fixed viewing area which is usually a rectangular viewing screen. Images displayed on the viewing screen often completely fill the viewing screen. Re-sizing of the image means that the image is re-displayed on the viewing screen in such a way that the image is perceived by the HVS to have been horizontally or vertically stretched or compressed. The ratio of the horizontal size of the image after re-sizing to the horizontal size before re-sizing is the horizontal scaling factor. Similarly, the vertical scaling factor is the ratio of the vertical size of the image after re-sizing to the vertical size before re-sizing. Scaling factors greater than unity represent stretching and scaling factors less than unity represent shrinking of a displayed image.
The aspect ratio of a displayed image is the ratio of its width to its height. If the image is resized using equal horizontal and vertical scaling factors, then the resized image will have the same aspect ratio as the original. By using equal horizontal and vertical scaling factors, objects will grow or shrink without changing their shapes and the re-sizing operation will be perceived by the HVS as having caused all objects within the image to have moved closer to or farther away from the viewer.
Resized images may exceed the dimensions of the viewing screen. Thus, some parts of the resized image may lie outside of the viewing screen and may not be visible. A cropped image is that portion of the image that is displayed in the viewing screen.
Zooming is the re-sizing of a set of digitized images or displayed images to create the perception in the HVS that objects within the displayed image are growing or shrinking with time. Zooming creates the illusion that the distances between the viewer and the objects in the displayed image are growing or shrinking.
Smooth zooming is zooming such that the scaling factors increase or decrease sufficiently slowly over the set of displayed images to create the perception in the HVS that objects within the displayed image are growing or shrinking continuously with time during the process of zooming.
If the scaling factors are increasing with time the zooming process is in the zoom-in mode and if the scaling factors are decreasing with time the zooming process is in the zoom-out mode. For example, a smooth zoom in the zoom-in mode can be achieved in 11 successive frames having scale factors, relative to the first frame, of 1.0, 1.1, 1.2 . . . 1.8, 1.9 and 2.0, resulting in the final scaling factor of 2.
Prior art methods for re-sizing and zooming digitized images operate directly on the numbers which represent each of the pixels in the displayed image. That is, they operate in the spatial domain.
One prior art method of re-sizing images is the pixel replication method which uses integer scale factors. The pixel replication method simply copies each pixel some integer number of times in both the horizontal and vertical directions. For example, with a scaling factor of 3, each pixel of the original image is replicated to form a corresponding 3xc3x973 square of pixels in the resized image. A primary disadvantage of pixel replication methods is that the jagged staircase effects of the pixelization increase in proportion to the scaling factor. Thus, pixel replication introduces significant and often unacceptable distortion of the image after re-sizing. A further disadvantage of the pixel replication method is that it is limited to enlarging images by integer scaling factors. Therefore, it cannot be used to reduce the size of the image, nor can it be used to re-size an image by non-integer scale factors, such as 1.1, 1.2, etc., as required for smooth zooming operations.
Another prior art method is the pixel sub-sampling method, which is used to reduce the size of images using scaling factors that are the reciprocals of integers, such as xc2xd, ⅓ or xc2xc. This method implements the scaling factor 1/Q in the horizontal or vertical directions, where Q is an integer, by retaining only one of every Q numbers, or pixels, in the digital image in the respective directions. The pixel sub-sampling method creates undesirable distortions in displayed images. Such distortions lead to the loss of edge definition in objects and are noticeable by the HVS in displayed images that are resized using this method, especially for large values of Q.
Abrupt spatial variations of the intensity of the resized image are especially susceptible to distortion and are therefore objectionable to the HVS. This distortion can be alleviated if the intensity variation between pixels is made to vary smoothly in all directions after pixel replication or before pixel sub-sampling. The prior art includes a variety of methods for achieving the required spatial domain smoothing. These prior art spatial domain methods have a number of disadvantages, including the need for operating directly on each of the pixels in the unscaled spatial domain version of the digital image. Other versions of the image, such as encoded or compressed versions, in which the image is not spatially represented pixel by pixel, cannot be used with the prior art spatial domain smoothing methods.
In the case of pixel replication, the required smoothed image is often obtained by means of a two-dimensional spatial domain low-pass filtering operation that effectively smooths out the spatial intensity variations in the digitized image. Two-dimensional spatial domain low-pass filtering methods vary in sophistication depending upon the level of distortion that can be tolerated in the displayed image. Simple spatial domain weighted averaging of regional pixels can be used to smooth the intensity variations
in the pixel-replicated version of the image, thereby reducing the jagged edge effects. More complicated spatial domain filtering methods smooth each number in each row and column of the resized digital image. This may require hundreds of multiplication and addition operations to be performed on each pixel, which may lead to millions of operations for each resized image. For example, if an image of size 512xc3x97512 is resized to 4096xc3x974096 (scaling factor of 8) by pixel replication and a smoothing operation requiring 100 multiplication operations per pixel is employed on the resized image, then the total number of multiplication operations that is required for smoothing each resized image is 4096xc3x974096xc3x97100. This results in over 1,600,000,000 multiplication operations per resized image.
If the image is cropped to a 512xc3x97512 size, then the minimum number of multiplication operations is reduced to 512xc3x97512xc3x97100, which is over 25,000,000 per cropped image. Accordingly, the prior art methods for the spatial domain reduction of distortion due to pixel replication involves the use of extensive hardware and software resources that operate on a pixel by pixel basis.
In the case of pixel sub-sampling, a two-dimensional low-pass filtering operation is performed on the image prior to sub-sampling to smooth out the intensity variations. This spatial domain smoothing method has the effect of pre-distorting the image to minimize the distortion effects in the resized image following pixel sub-sampling. The effect of this smoothing operation is to blur many objects in the image so that their sharp edges are no longer distinct after pixel sub-sampling. The required spatial domain filtering operations, like those used in pixel replication, vary in sophistication and complexity. When simple weighted averaging is used to achieve this filtering, it is often necessary to use hundreds of multiplication and addition operations per pixel. For example, to re-size a 512xc3x97512 image to a 64xc3x9764 image (scaling factor of xe2x85x9) by pixel sub-sampling using a smoothing operation that requires 100 multiplication operations per pixel of the resized image, the total number of multiplication operations required for smoothing each resized image is 64xc3x9764xc3x97100. This results in over 360,000 multiplication operations per resized image of size 64xc3x9764. Pre-distorting the original image using this smoothing operation requires 512xc3x97512xc3x97100 multiplication operations on the original digital image. This method, like the pixel replication method, also requires extensive hardware and software resources that operate on a pixel by pixel basis.
There are prior art methods for re-sizing images in the spatial domain by fractional scaling factors, such as {fraction (22/73)} or {fraction (101/100)}. This capability is required for smooth zooming and other re-sizing effects. Almost all of the prior art methods involve a combination of the pixel replication and pixel sub-sampling methods described above, along with the appropriate smoothing operations. The prior art methods for fractional scaling factors essentially combine pixel replication and pixel sub-sampling in a single system, which are referred to in the digital signal processing literature as multirate digital systems. Such methods operate on the image in the spatial domain, pixel by pixel, and they employ complicated, computationally intensive digital filtering techniques to smooth out the jagged edges and other distortions in the resized image.
Panning is the cropping of a set of digitized images or displayed images to create the perception in the HVS that all objects within the displayed image are uniformly translated between any two adjacent images in the displayed sequence of images.
Smooth panning is panning such that displacements of objects in successive displayed images are sufficiently small to create the perception in the HVS that all objects within the displayed image are moving continuously with time in the displayed sequence of images.
When objects in displayed images are moving during the zooming or panning processes, the perceptions of zooming and panning are retained by the HVS provided that the movements over time of objects in images are sufficiently slow.
Image compression is the process by which digital images are represented by a fewer number of bits of information. Image compression allows images to be stored in fewer bits of digital memory and thereby reduces the size of computer files that are required for storing images. Image compression also allows the compressed version of digital images to be transmitted over communication channels at a faster rate than uncompressed images.
Compressed versions of spatial domain images must be decompressed prior to display.
Most image compression methods employ mathematical techniques that transform the spatial domain representation of the image to a corresponding transform domain version of the image that is more suitable for compression. For example, the transform domain version typically employs data that has the mathematical property that it is significantly less correlated than the data in the spatial domain which is a desirable property for subsequent compression. A wide variety of compression methods, including quantization and coding, are routinely employed to compress the transform domain version of the image.
The decompression process attempts to recover the original transform domain version in which case the compression-decompression process is said to be lossless. In practice the decompression algorithms often recovers only an approximation to the original spatial domain image, and the process is said to be lossy.
The transform domain versions of images do not necessarily contain data that corresponds to spatial domain pixels and therefore may not be operated upon by the above mentioned prior art re-sizing and zooming methods to achieve re-sizing and zooming of displayed images.
International standards have been developed for accomplishing compression of digital images. For example, the MPEG 1 and 2 compression standards are the most widely used for digital image sequences while JPEG is commonly used for still images.
These standards use the discrete cosine transform (DCT) to convert spatial domain images to the transform domain and the inverse discrete cosine transform (IDCT) is used to convert the transform domain version to the corresponding decompressed spatial domain image. Prior art methods of re-sizing, zooming or panning displayed images, that have been compressed using the MPEG standards, operate on the spatial domain image.
The present invention provides a method of re-sizing a spatial domain image by modifying the transform domain method that is used to transform the transform domain version of the image to the spatial domain version of the image. By modifying the transform domain method, there is no need to perform further pixel-by-pixel image re-sizing, zooming or panning operations on the image in the spatial domain. The detrimental effects caused by re-sizing an image in the spatial domain, such as jagged edges, blurring, aliasing or other image degradations, are reduced in the present invention.
The standard transform method, which transforms the image from the transform domain to the spatial domain, is modified. Transformation of the transform domain image, using the Modified Transform Method, produces an altered spatial domain image. These modifications to the transform operation are performed by the Modified Transform Method in such a way as to achieve desirable re-sizing or zooming effects in the altered spatial domain image after transformation.
It is a technical advantage of the present invention to provide a method of modifying the appearance of the spatial domain representation of a digital image by applying a Modified Transform Method to the transform domain version of the image. The re-sized spatial domain image can be achieved by modifying a large number of transformation methods, such as the Inverse Discrete Cosine Transform method, the Inverse Discrete Fourier Transform method and the Inverse Discrete Wavelet Transform method. Furthermore, the invention can be applied to the block images (that is, blocks) used in standard image compression techniques, such as MPEG or JPEG.
It is a further technical advantage of the invention that no further modification of the spatial domain image is required after performing the Modified Transform Domain Method. This precludes the need for extensive pixel-by-pixel modification of the spatial domain image in order to re-size or zoom.
An additional technical advantage of the invention is to provide for modification of digital images without using extensive hardware and software resources. Re-sizing and zooming modification of the spatial domain representation of an image requires expensive and time consuming pixel-by-pixel processing. In the present invention, the spatial domain images created by the Modified Transformation Method are in their re-sized or zoomed form and are capable of being displayed without further modification or processing.
The foregoing has rather broadly outlined the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the concept and the specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purses of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.