The invention relates to the field of signal processing, with applications in computer graphics, and in particular to the 2D image processing field Broadly speaking, computer images, whether video or still images, are normally stored as pixel intensity values, usually in the form of digital information, in a succession of rows of pixel intensity values.
The invention relates particularly to image scaling of a digital image, for example, to produce a different output format and/or size and has many industrial applications, for example in real-time manipulation of an on-screen image (for instance to allow resizing in an arbitrary sized window) or transferring images to different output formats. The invention is particularly suitable for applications in video, broadcasting and HDTV.
The process of scaling an image generally consists of three steps: reading or capturing the input data, performing the transformation (by sampling and any necessary corrections) and storing the resultant image.
For an analogue input, a pixel representation of the image is usually obtained by sampling a continuous input signal associated with a real object (the signal could be an analogue output of a video camera or a mathematical representation of an object) at a specific sampling rate. This allows conversion of the continuous (analogue) signal into its discrete (digital) representation. Digital input signals may also be resampled further (possibly with a different sampling rate) to change the size and/or resolution of the images which they represent.
The problems of scaling an analogue or digital image can be perceived in the broader context of signal processing theory. The sampling procedure may lead to a loss of information contained in the image. Mathematically, the minimum sampling frequency at which the input signal must be sampled in order to retain all the frequencies contained within it is twice that of the highest frequency component present in the input signal. This sampling frequency is known as the Nyquist Frequency.
If the higher frequencies are undersampled (that is, the sampling is at too low a frequency) they will be misrepresented in the output as lower harmonics; this is known as aliasing. One way to eliminate aliasing is to increase the sampling frequency. Where this is not possible, the high frequencies that will be misrepresented must be removed from the input signal. This can be achieved by performing a Fourier Transform on the input signal, limiting the frequency spectrum to up to half of the Nyquist Frequency and performing the Inverse Fourier Transform to return to the spatial domain. However, in the case of real-time systems, performing a Fourier Transform may be computationally too time consuming.
Another way of removing the high frequency components is through the use of digital filters in the spatial domain. The term xe2x80x9cdigital filterxe2x80x9d refers to a computational process or algorithm by which a digital signal or sequence of numbers (acting as an input) is transformed into a second sequence of numbers termed the output digital signal. There are two broad classes of such filters: Infinite Impulse Response (IIR) and Finite Impulse Response (FIR) filters. Both are well known.
In digital image scaling the general purpose of an FIR filter is to work out a weighted sum of contributions from source pixels to a target pixel.
The output of an FIR filter can be defined by the convolution of the filtering function (P) with the signal intensity function (I):             ξ      ⁢              (        x        )              =                  ∑                  t          =                      -                          Fw                              1                /                2                                                              Fw                      1            /            2                              ⁢              xe2x80x83            ⁢                                    I            ⁢                          (                              x                -                t                            )                                ·                      F            ⁢                          (              t              )                                      ⁢        dt              ,
where Fw1/2 represents half of the filter width expressed in pixel units. A convention may be adopted in which the filter is centred on the midpoint of the central pixel of its support range (the pixels it filters) and the total filter width Fw is therefore given as 2.FW1/2+1 (pixels). However, other conventions are equally valid.
Digital image scaling may be defined as (re)sampling of an input digital signal representing a digital image, possibly using a different sampling frequency from the original frequency to give a different resolution. The target may be smaller or larger than the original (or source) image and/or have a different aspect ratio. Downscaling (reduction of image size) gives a smaller target than source image and upscaling (increase of image size) gives a larger size.
There are many scaling methods available. The simplest and fastest scaling method is probably the pixel decimation/replication technique. Here, some of the original sampled pixels are simply omitted for downscaling and replicated for upscaling. The image quality produced is, however, often poor. Additional measures aimed at improving the image quality, such as replicating original samples prior to resampling, are often employed (U.S. Pat. No. 5,825,367). A possible problem with this approach is that it not only ignores any frequency consideration, which leads to presence of aliasing, but it also introduces other artifacts (image distortions) such as unwanted, often jagged lines and/or large blocks of equally coloured pixels in the image.
Partial improvement may be achieved through interpolation. In this technique, broadly speaking, rather than replicating source pixels to arrive at the additional pixel values during upscaling, there is interpolation between the values of two or more source pixel values (for example using higher-order polynomial interpolation). While aliasing artifacts are still likely to be present, the overall image quality is improved and the image is smoothed. Such smoothing may lead to a loss of contrast and the interpolated images often look blurry (U.S. Pat. No. 5,793,379). There are a number of possible refinements to interpolation in its simplest one-dimensional form. Probably the most advanced of these is three-dimensional interpolation as described, for example, in U.S. Pat. No. 5,384,904.
All the above approaches suffer from the same basic drawback: they do not provide high frequency adjustments and thus inevitably lead to the introduction of aliasing (and therefore artifacts).
As explained in previous paragraphs, the application of FIR filters removes this problem to some extent. However, although not always as computationally expensive as the Fourier transform, FIR filters still pose serious challenges for use in real-time environments. When implemented in hardware, FIR filters tend to occupy a large area of silicon in order to ensure that a sufficiently large number of sample points, or filter taps, is taken into account for computation. The FIR filter computes the value of the convolution of the filtering and the image intensity functions. The larger the number of sample points, the sharper the frequency cut-off of a filter and the smaller the spectrum of offending high frequencies passing through the filter. The number of points at which the convolution (i.e. the number of filter taps) has to be evaluated increases with the scaling ratio. Thus there is a threshold value above which the number of input pixels required for filter support exceeds the number of the taps available in silicon. To allow for higher scaling ratios, some method of limiting the number of input points or simulating wider filters using narrower ones must be implemented. An example of such an implementation, using decimating filters, can be found in U.S. Pat. No. 5,550,764. Unfortunately, as with all decimation, some of the input information is discarded and the quality of the output is thus degraded.
Software implementations do not exhibit these constraints, but due to potentially large amounts of input data required for generating a single output pixel, the performance of such implementations sometimes renders them unsuitable for real time processing.
The present invention aims to overcome or mitigate at least some of the disadvantages inherent in the prior art.
According to a first aspect of the invention there is provided a parallel processing method and system for scaling a digital source image consisting of a matrix of X by Y pixels into a target image of a different resolution, comprising the steps of:
mapping the higher resolution pixels onto the lower resolution pixels; scaling the source image in the X or Y direction to produce intermediate pixels that are scaled in one direction by determining contributions to each intermediate pixel using a suitable digital filter function and accumulating (or summing) the contributions for each intermediate pixel, wherein each source pixel contributes to one or more intermediate pixels and each intermediate pixel receives contributions from one or more source pixels; and subsequently scaling the intermediate pixels in the other direction by determining the contributions to each target pixel using the filter function and accumulating the contributions for each target pixel; wherein each intermediate pixel contributes to one or more target pixels and each target pixel receives contributions from one or more intermediate pixels.
The system and method according to embodiments of the invention seek to improve on the prior-art methods for digital scaling. Parallel processing of the source/intermediate pixels (which may be of up to one entire line) allows a real-time process with faster production of the target image. The present invention can be implemented using an FIR filter implemented in software on a SIMD (Single Instruction Multiple Data) processing array in which each PE (processing element of the array)receives the same instruction to parallel process the pixels. Such an arrangement possesses a high level of flexibility and adaptability while exceeding the performance of typical dedicated hardware implementations Alternative parallel processing systems may also be used. Each source pixel or intermediate (hereinafter source pixel for brevity) contributes to the target image and usually to more than one target pixel. This avoids any decimation with the attendant disadvantages.
The present invention provides a high performance and high quality method for scaling images on a SIMD computing device. Additional advantageous features, in particular relating to suitable methods for feeding data into and out of the SIMD processing array are presented in the method as detailed below.
The system may be in the form of hardware and/or software with suitable tools, such as apparatus, circuitry, and/or code structures to carry out the method defined above. It may further comprise additional tools to carry out the further method steps as detailed below.
Reference herein to a matrix is to any two dimensional array of pixels, such as a grid or a skewed grid or other two-dimensional array.
Reference herein to the X direction is generally to the direction across the screen, conventionally to the right along the lines (or viewing screen) and reference to the Y direction is generally to the xe2x80x9ccolumn xe2x80x9d direction down the lines (or screen). However, any suitable X and Y directions (preferably at right angles) may be used to correspond to the array of digital information which represents the pixels.
Reference herein to parallel processing is to processing in which more than one pixel value is processed simultaneously. In many cases, an entire line or column of values may be processed simultaneously.
Reference herein to mapping is to determining the spatial correspondence between images, usually between the higher and lower resolution pixels. Reference to resolution is to the number of pixels of the pixel grid forming an image. The more pixels making up an image, the higher the resolution.
Reference herein to scaling is to changing the aspect ratio of the image and/or the resolution of the image, this latter so that the resultant number of pixels in the target image is smaller (lower resolution) or larger (higher resolution) than in the source image. If the source and target images are displayed with pixels of the same size, the higher resolution image will be larger.
Reference herein to a digital filter or digital filter function is to the overall computational process or algorithm by which a digital signal or set of pixel values is transformed into a second set of numbers. This process preferably includes an integration/convolution function.
The method preferably includes the step of mapping a cluster of the higher-resolution pixels of one image onto each of the lower-resolution pixels of the other image. The cluster may be mapped in one or both directions. A cluster may be defined as all the higher-resolution pixels falling within the footprint of a lower-resolution pixel. This correspondence can be seen, for example, when the two images and their respective pixel boundaries are superimposed, with the actual measurements of the images being identical. The higher resolution image will be made up of more pixels (within the same space) than the lower resolution image.
In one embodiment a higher resolution pixel belongs to the cluster of a lower resolution pixel if the midpoint (or centre) of the higher resolution pixel considered falls within (the footprint of) that lower resolution pixel.
This cluster feature is particularly applicable in downscaling, when the digital filter is combined with a cluster mapping step. Cluster mapping allows simplified computational processes, in that pixel correspondence in the X and/or Y directions is easily determined. This has a particular advantage in the X direction that, once the X clusterisation has been defined along the first line, it does not vary as the process continues to further lines.
Advantageously, the method includes the step of calculating the distance between the centre of each higher resolution pixel and the centre of each lower resolution pixel to which it contributes, or from which it receives contributions in the X and Y directions. The filter function may then be applied to give a filter factor (or filter function value). The filter factor is subsequently multiplied by the pixel intensity to give a contribution to a final convolution value for a target pixel. The final convolution value is the sum of all the contributions. Thus the contribution that each pixel makes is determined by its distance from the centre of the lower resolution pixels as well as its intensity.
Preferably, the filter factors are determined for the X direction prior to reading scanlines. These values do not change for subsequent lines: the jth source pixel along a line will always have the same filter function applied to it.
Preferably, the method also comprises the step of calculating the distance from the cluster boundary of the source (or intermediate) pixels as they are read in and defining a process change or increment to occur when the cluster border is crossed. This is particularly appropriate for the Y direction. Process increments may be, for example, application of the digital filter function in its next position, to give the next filter support range in source space(source pixels to which the digital filter is applied).
Advantageously, the filter function is evaluated analytically, to obtain contributions to the target pixels; that is, without approximation of the integration. This is particularly appropriate for parallel processing, in which computation cost may be lower.
The filter function at each position may be evaluated at any number of points suitable for the scaling required. For example, the number of points in downscaling is dependent on the spread of the filter function (or filter footprint) in source space and is known as the filter footprint number. The filter footprint number in downscaling is thus the number of source pixels in a single filter support range. Each point corresponds to a separate source or target pixel (in downscaling and upscaling respectively).
In downscaling, (and in each direction) each target pixel preferably receives contributions (from a number of source pixels) that is equal to the filter footprint number and in upscaling (and in each direction) each source pixel contributes to a number of target pixels that is equal to the filter footprint number.
The filter width is a process constant and may be defined as the filter support range in target space for downscaling and in source space for upscaling.
The number of contributions from each source pixel in downscaling and the number of contributions to each target pixel in upscaling is constant and set according to how many footprints of neighbouring filters spanned in target space (or source space respectively) overlap. This number always corresponds to the filter width
In one extreme case, the filter width is one and the filter footprints do not overlap; each source pixel contributes to one target pixel only.
From the above it can be seen that the two-way contribution link between the source and target pixels preferably depends at least partially on the filter width and at least partially on the filter footprint number.
The method can be carried out on any suitable processing means, such as a programmable array of memory cells, connected workstations or serial processors with SIMD extensions. Preferably the process uses a processing element (PE) array. Each PE may include a number of memory cells which may be implemented in hardware or software. Values corresponding to pixels or combinations of pixels may be stored, and shifted between the memory cells. Preferably, the memory cells form an addressable array, such as a data queue or shift stack. The data queue is preferably of first-in first-out structure.
PE interconnections may allow data to be moved along the PE array (hereinafter swazzled). The PE interconnections may result in a 1D ring/line array or a 2D grid array. Preferably, a 1D line array is provided.
There are may possible mappings of source/target pixels onto the PEs. In a first mapping, one PE is provided per pixel of the higher resolution image. This is a simple mapping, but swazzling distances of data along the PE array to reach the target pixels may be rather long, depending on the scaling factor. Scaling may therefore be restricted to some extent because it is not possible to swazzle more than the array length. This PE-pixel mapping is preferably used with a memory cell array of the same length as the filter width.
In another embodiment, one PE is provided per pixel of the lower resolution image. This second embodiment is particularly advantageous for downscaling in the X direction and may be suitable in the case where each line of the source image has more pixels than PEs in the array. This alternative mapping has the advantage that it limits the swazzle length. In contrast to the first mapping, scaling may be arbitrary, since higher resolution pixels (for example, those in one cluster) are xe2x80x9csquashedxe2x80x9d into a PE of the lower resolution image. Each PB then reads and processes two or more neighbouring source pixels sequentially and writes the target pixels sequentially.
The contributions are again stored in an array of memory cells or queue, preferably of the same length as the filter width.
In downscaling, one PE per source pixel may be provided for Y scaling and one PE per target pixel in X scaling. In general however, mapping for scaling in one direction follows the mapping imposed by the scaling in the other direction.
The pixel-PE mapping may be selected automatically according to process conditions such as the scaling required and/or relative dimensions of the source of target image and PE array.
To allow real-time functioning, the method according to the present invention should be carried out in parallel (for example, on each PE in the array simultaneously). If a method step is not required for one or more PEs in the array, it is then disabled. Preferably, the parallel processing is SIMD processing.
The method may additionally include the steps of reading the source image into memory before scaling and analysing the source and target dimensions and writing the target pixels to external memory (outside the PE array) after scaling. The intermediate pixels may also be written to external memory or may be used immediately in the next (X or Y) scaling step.
In one preferred embodiment, values are shifted in the array of memory cells during Y scaling. The values may be shifted up the memory cell array (which is possibly in the form of a data queue or shift stack) when the process crosses a cluster boundary. Preferably data is swazzled between PEs for X scaling. The data may be swazzled a certain distance across a predetermined number of cluster boundaries. As with the previous features, this may apply to both up- and downscaling.
The Y scaling step thus preferably involves shifting the contents of the memory cell array by one position when a cluster boundary is crossed.
A preferred method for the Y scaling step especially suitable for downscaling includes shift and accumulation in the memory cell array. It may also include the steps of multiplying the source pixel or intermediate pixel (hereinafter source pixels) by the appropriate filter function value and then reading the resultant contributions for each source pixel into the memory cell array, adding them to any contributions from one or more lines above the present scanline which are already in the memory cell. The process may be incremented to shift the array by one position by moving the top cell value out (preferably to external memory) when a cluster border is crossed (and the top cell thus has all its contributions). The lowest contribution of the next source pixel read into the bottom cell will be the first in a new target pixel to be produced.
A preferred method for the Y scaling step especially suitable for upscaling involves only shifting (no accumulation) in the array of memory cells. It may include the steps of reading each pixel value into a cell of the array, calculating the contributions for each pixel using the filter function and summing the contributions (in a number corresponding to the filter width) for each target pixel. The method may further involve shifting the source pixels up one position, in order to read in a new source pixel and discard an old source pixel when the target boundary is crossed.
X scaling advantageously involves accumulating contributions in one accumulating PE per cluster. The contributions for each source pixel are swazzled across the PE array to their correct accumulating PE.
A preferred method for the X scaling step especially suitable for downscaling involves unidirectional swazzles. The process preferably calculates the contributions for each source pixel (preferably using the values already available from X preparation); addresses them according to the distance they must travel and stores them in an array of memory cells for each PE, then shifts each contribution in turn along the PE array to the correct accumulating PE. The addressing preferably relates to the number of cluster boundaries the contribution must cross.
A preferred method for the X scaling step especially suitable for upscaling is generally uni-directional, although some movement in the opposite direction may be required. Advantageously the method involves loading the PE array with a repeating sequence of the source pixels in the corresponding cluster. The sequence is preferably indexed to increase by one and restart each time a cluster boundary is crossed. The sequence advantageously restarts with the first pixel in the sequence. Preferably it has a length equal to the filter width. The value loaded is conveniently held in one cell of the array of memory cells. The preferred swazzle movement moves the values one step at a time along the PE array to fill a different cell with its value. This preferred method works particularly well for narrow filters, but is less advantageous in extreme circumstances (such as for very high scaling ratios).
It may be that a uni-directional swazzle will not be sufficient to fill all the cells of all the PRs (due to indexing). In this case one or more swazzle steps in the other direction may be provided to fill the empty cells. After the cells have been filled, the convolution is calculated.
According to a further aspect the invention relates to a program for carrying out the method as hereinbefore described. The program may be embodied on a carrier, such as a CD or carrier wave and may be a computer program product. Alternatively, the program may be embedded in on-chip ROM, thus becoming equivalent to a hardware part of a chip.
According to a further aspect the invention relates to a device, such as a computer or set-top box, comprising a PE array as hereinbefore described and calculating tools (in hardware or software) for carrying out the method as hereinbefore described. Further details of the device may include input tools for user parameters and connection tools to other devices. The device may further comprise tools for reading and writing image data as pixels and display means.