1. Field of the Invention
The present invention is generally related to the field of analog and digital signal processing, and more particularly, to apparatus and methods for the efficient representation and processing of signal or image data.
2. Description of the Prior Art
FIG. 1 is a block diagram of a typical prior art signal processing system 100. As shown in the figure, such systems typically include an input stage 102, a processing stage 104, an output stage 106, and data storage element(s) 108.
Input stage 102 may include elements such as sensors, transducers, receivers, or means of reading data from a storage element. The input stage provides data which are informative of man-made and/or naturally occurring phenomena. The informative component of the data may be masked or contaminated by the presence of an unwanted signal, which is usually characterized as noise. In some applications, an input element may be employed to provide additional control of the input or processing stages by a user, a feedback loop, or an external source.
The input data, in the form of a data stream, array, or packet, may be presented to the processing stage directly or through an intermediate storage element 108 in accordance with a predefined transfer protocol. Processing stage 104 may take the form of dedicated analog or digital devices, or programmable devices such as central processing units (CPUs), digital signal processors (DSPs), or field programmable gate arrays (FPGAs) to execute a desired set of data processing operations. Processing stage 104 may also include one or more CODECs (COder/DECcoders).
Output stage 106 produces a signal, display, or other response which is capable of affecting a user or external apparatus. Typically, an output device is employed to generate an indicator signal, a display, a hardcopy, a representation of processed data in storage, or to initiate transmission of data to a remote site, for example. It may also be employed to provide an intermediate signal for use in subsequent processing operations and/or as a control element in the control of processing operations.
When employed, storage element 108 may be either permanent, such as photographic film and read-only media, or volatile, such as dynamic random access memory (RAM). It is not uncommon for a single signal processing system to include several types of storage elements, with the elements having various relationships to the input, processing, and output stages. Examples of such storage elements include input buffers, output buffers, and processing caches.
The primary objective of signal or information processing system 100 is to process input data to produce an output which is meaningful for a specific application. In order to accomplish this goal, a variety of processing operations may be utilized, including noise reduction or cancellation, feature extraction, data categorization, event detection, editing, data selection, and data re-coding.
The design of a signal processing system is influenced by the intended use of the system and the expected characteristics of the source signal used as an input. In most cases, the performance efficiency required, which is affected by the available storage capacity and computational complexity of a particular application, will also be a significant design factor.
In some cases, the characteristics of the source signal can adversely impact the goal of efficient data processing. Except for applications in which the input data are naturally or deliberately constrained to have narrowly definable characteristics (such as a limited set of symbol values or a narrow bandwidth), intrinsic variability of the source data can be an obstacle to processing the data in a reliable and efficient manner without introducing errors arising from ad hoc engineering assumptions. In this regard, it is noted that many data sources which produce poorly constrained data are of importance to people, such as sound and visual images.
Conventional image processing methods suffer from a number of inefficiencies which are manifested in the form of slow data communication speeds, large storage requirements, and disturbing perceptual artifacts. These can be serious problems because of the variety of ways it is desired to use and manipulate image data, and because of the innate sensitivity people have for visual information.
Specifically, an "optimal" image or signal processing system would be characterized by, among other things, swift, efficient, reliable, and robust methods for performing a desired set of processing operations. Such operations include the transduction, storage, transmission, display, compression, editing, encryption, enhancement, sorting, categorization, feature detection and recognition, and aesthetic transformation of data, and integration of such processed data with other information sources. Equally important, in the case of an image processing system, the outputs should be capable of interacting with human vision as naturally as possible by avoiding the introduction of perceptual distractions and distortion.
That a signal processing method should be robust means that its speed, efficiency, and quality (for example), should not depend strongly on the specifics of any particular characteristics of the input data, i.e., it should perform "optimally," or near that level, for any plausible input.
This is an important aspect because a common inadequacy suffered by signal processing methods is their failure to be robust. JPEG-type methods in imaging, for example, perform better for "photographic" images having gentle gradations in color and luminance than for graphic images and others having sharp discontinuities. On the other hand, image compression methods such as those embodied in the GIF format perform best when an image has few of the complexities found in photographic images. Similar examples may be cited with regard to processing operations performed on audio and other classes of input data.
In part, conventional image processing methods lack robustness because there are an infinite number of possible images. Adding to this is the complication that in most situations, it is impossible to know beforehand exactly what features and complexities an image will possess. Thus, to describe an image entirely, one approach is to determine the luminance and color of every point in the image. However, the volume of information needed to accomplish this task can exceed several megabytes for a digital image of moderate size, making it burdensome to store, process, and transmit such information. Even then, the digital representation is an inexact record of the original image owing to the limitations inherent in constructing binary value based representations of continuous analog signals.
Information is lost in any discrete representation of continuous-valued data because discrete sampling over any finite duration or area cannot capture all of the variations in the source data. Similarly, information is lost in any quantization process when the full range of values in the source data cannot be represented by a set of discrete values.
In addition to difficulties imposed by the nature or implementation of a processing operation, other problems must be addressed when contaminating noise sources mask or distort the component of an input that is assumed to represent a signal of interest. However, it is rarely appreciated that there are other forms of randomness and unpredictability which cannot be defined legitimately as noise but which are nonetheless the source of problems with regard to the quality and robustness of signal processing methods. These forms of unpredictability may be considered in terms of intrinsic randomness and ensemble variability. Intrinsic randomness refers to randomness that is inseparable from the medium or source of data. The quantal randomness of photon capture is an example of intrinsic randomness.
Ensemble variability refers to any unpredictability in a class of data or information sources. Data representative of visual information has a very large degree of ensemble variability because visual information is practically unconstrained. Visual data may represent any temporal series, spatial pattern, or spatio-temporal sequence that can be formed by light. There is no way to define visual information more precisely. Data representative of audio information is another class of data having a large ensemble variability. Music, speech, animal calls, wind rustling through the leaves, and other sounds share no inherent characteristics other than being representative of pressure waves. The fact that people can only hear certain sounds and are more sensitive to certain frequencies than to others is a characteristic of human audio processing rather than the nature of sound. Examples of similarly variable classes of data and information sources can be found throughout nature and for man-made phenomena.
The unpredictability resulting from noise, intrinsic randomness, and ensemble variability, individually and in combinations, makes it difficult and usually impossible to extract the informative or signal component from input data. Any attempt to do so requires that a signal and noise model be implicitly or explicitly defined. However, no signal and noise model can be employed which is able to assign with absolute confidence a component of input data to the category of informative signal as opposed to uninformative noise when there is any possibility that the noise, intrinsic randomness, or ensemble variability share characteristics.
A signal and noise model is implicitly or explicitly built into a signal processing operation in order to limit the variability in its output and to make the processing operation tractable. Signal processors generally impose some form of constraint or structure on the manner in which the data is represented or interpreted. As a result, such methods introduce systematic errors which can impact the quality of the output, the confidence with which the output may be regarded, and the type of subsequent processing tasks that can reliably be performed on the data.
An often unstated but significant assumption employed in signal processing methods is that source data can be represented or approximated by a combination of symbols or functions. In doing so, such processing methods essentially impose criteria by which values and correlations in an input are defined or judged to be significant. A signal processing method must embody some concept of what is to be regarded as signal. However, the implicit or explicit presumption that a certain set of values or certain kinds of correlation can be use to provide a complete definition of a signal is often unfounded and leads to processing errors and inefficiencies. By defining a signal in terms of a set of values or correlations, a processing method is effectively assigning all other values and correlations to the category of noise. Such an approach is valid only when it is known that: 1) the information source that the input data represents takes on only a certain set of values or correlations; and 2) noise or randomness in the input data never cause the input to take on those values or correlations by chance. Conditions of this sort are rare at best and arguably never occur in real life. These conditions are certainly not true for visual, audio, or other information sources which have an unconstrained ensemble variability. For such classes of data, a finite set of values or correlations is insufficient to completely cover the range of variability that exists. As a result, some values or correlations which are representative of an information source will be inevitable and erroneously assigned to the category of noise. It should be noted that the inventive method herein does not presume such a set of specific values or correlations.
To further illustrate some of the limitation of signal and noise models in general, we discuss in this section several processing techniques which are found in the field of image processing. Among conventional image and signal processing techniques are histogram methods, predictive coding methods, error coding methods, and methods which represent data in terms of a set of basis functions such as JPEG, MPEG, and wavelet-based techniques.
Histogram methods are based on categorizing the luminance and color values in an image, and include the concept of palettes. A histogram is related to a probability density function which describes how frequently particular values fall within specified range limits. Histogram methods are used to quantize source data in order to reduce the number of alternative values needed to provide a representation of the data. In one form or another, a histogram method has been applied to every digital image that has been derived from continuous-valued source data. Histogram methods are also used for aesthetic effect in applications such as histogram equalization, color re-mapping, and thresholding.
However, a disadvantage of histogram techniques is that the processing scheme used to implement such methods must determine which ranges of value and color are more important or beneficial than others. This conflicts with the fact that the distribution of values in an image varies dramatically from one image to the next. Similarly, the number and location of peaks and valleys in a histogram varies significantly between images. As a result, histogram methods are computationally complicated and produce results of varying degrees of quality for different kinds of images. They also tend to produce an output having noticeable pixelation and unnatural color structure.
Predictive coding methods attempt to compensate for some of the limitations of histogram methods by considering the relationship between the image values at multiple image points in addition to the overall distribution of values. Predictive coding techniques are suited to data having naturally limited variability, such as bi-tonal images. Such methods are an important part of the JBIG and Group 3/4 standards used for facsimile communications. However, for more complicated image data such as multi-level grayscale and full color images, predictive coding methods have not been as effective.
Predictive coding techniques are based on the hypothesis that there are correlations in image data which can be used to predict the value of an image at a particular point based on the values at other points in the image. Such methods may be used to cancel noise by ignoring variations in an image that deviate too significantly from a predicted value. Such methods may also be used in image compression schemes by coding an image point only when it deviates significantly from the value predicted.
However, one of the problems encountered in predictive coding is the difficulty in deciding that a particular deviation in an image is an important piece of information rather than noise. Another source of difficulty is that correlations in an image differ from place to place as well as between images. At present, no conventional predictive coding method has employed a sufficiently robust algorithm to minimize processing errors over a realistic range of images. As a result, conventional predictive coding methods tend to homogenize variations between images.
Error coding methods extend predictive methods by coding the error between a predicted value and the actual value. Conventional error coding methods tend to produce a representation of the input data in which small values near zero are more common than larger values. However, such methods typically do not reduce the total dynamic range from that of the input data and may even increase the range. As a result, error coding methods are susceptible to noise and quantization errors, particularly when attempting to reconstruct the original source data from the error-coded representation. In addition, since error coding is an extension of predictive coding, these two classes of methods share many of the same problems and disadvantages.
Representation of data using a set of basis functions is well known, with Fourier techniques being perhaps the most familiar. Other transform methods include the fast Fourier transform (FFT), the discrete cosine transform (DCT), and a variety of wavelet transforms. The rationalization for such transform methods is that the basis functions can be encoded by coefficient values and that certain coefficients may be treated as more significant than others based on the information content of the original source data. In doing so, they effectively regard certain coefficient values and correlations of the sort mimicked by the basis functions as more important than any other values or correlations. In essence, transform methods are a means of categorizing the correlations in an image. The limitations of such methods are a result of the unpredictability of the correlations. The variations in luminance and color that characterize an image are often localized and change across the face of the image. As a result, FFT and DCT based methods, such as JPEG, often first segment an image into a number of blocks so that the analysis of correlations can be restricted to a small area of the image. A consequence of this approach is that bothersome discontinuities can occur at the edges of the blocks.
Wavelet-based methods avoid this "blocking effect" somewhat by using basis functions that are more localized than sine and cosine functions. However, a problem with wavelet-based methods is that they assume that a particular function is appropriate for an image and that the entire image may be described by the superposition of scaled versions of that function centered at different places within the image. Given the complexity of image data, such a presumption is often unjustified. Consequently, wavelet based methods tend to produce textural blurring and noticeable changes in processing and coding quality within and between images.
To address some of the problems arising from the complexity of images as an information source, a number of attempts have been made to incorporate models of human perception into data processing methods. These are based on the belief that by using human visual capabilities as a guide, many of the errors and distortions introduced during processing can be rendered inconsequential. In essence, use of human perceptual models provides a basis for deciding that some visual information is more important than other information. For example, television and several computer image formats explicitly treat luminance information as more important than color information and preferentially devote coding and processing resources to grayscale data. While this approach shows promise, there is no sufficiently accurate model of human perception currently available to assist in processing image data. As a result, attempts to design processes incorporating such models have resulted in images that are noticeably imperfect.
What is desired and needed are apparatus and methods for the processing of general signal and image data which are more efficient than conventional approaches. In particular, signal and image processing apparatus and methods are desired which are less computationally complex and have reduced data storage requirements compared to conventional methods. Apparatus and methods for reconstructing signals and images from processed data without the degradation of signal or image quality found in conventional approaches are also desired.
The present invention provides such apparatus and methods.