This invention relates to methods for sampling objects. In particular it provides a system implementing a rigorous mathematical framework and provably correct practical algorithmic implementations for digitally recording a signal having any number of degrees of freedom.
A physical or virtual object is recorded for application specific analysis by sampling or measuring it at various points. An object might be a one dimensional electrical signal, price history or other time series, a two dimensional image, a three dimensional vector field, or an N dimensional surface in, say, configuration space, momentum space, or phase space, all useful in physics. One important example is the current use of uniformly spaced samples of signals or images to analyze these objects for subsequent processing like filtering or convolution. This analysis involves calculating approximations to the Fourier transform, wavelet transform or other related transforms. The aggregate of sample points generated by the measuring process creates a mosaic from the original physical object. The degree of congruence of the mosaic to the original depends on the number of sample points used as well as on the judicious location of the sample points.
A classical example of such is recording of an audio signal for replay by sampling the signal amplitude at uniform time intervals less than half the period of the highest frequency present in the signal. The so-called Nyquist theorem assures that such a recording enables reconstruction of the original signal. If the uniform sampling occurred at a lesser rate, in many cases an attempt to replay the sampled signal would result in a distortion termed xe2x80x9cartifactingxe2x80x9d in which harmonics of the signal would not correctly reproduce.
Limited resources place constraints upon the generation of sample points. Therefore sampling is often a fairly expensive operation in terms of system resources consumed. Depending on the application, the number of sample points generated might be constrained by the resolution of the measuring equipment used (cost constraint), by the amount of storage available to save the samples (space constraint) as well as by the observation time window of opportunity to obtain the samples (time constraint). Thus, a set of samples can be regarded as a somewhat scarce resource. Therefore, each sample point should have impact and provide a maximal contribution of information content to the whole mosaic.
Since there is a cost associated with the generation of each sample point, some optimization of sampling is generally required. Particularly in the case of objects having periodically recurring patterns there is much redundancy in samples taken in uniform intervals. Or, more often, an object might have mostly unchanging features with little useful information content. But, say 10% of it contains volatile features of great interest. This has led to the use of non-uniformly spaced, or irregularly patterned, data samples.
For instance, a signal might exhibit rapid change (or volatility) in one concentrated region, but otherwise be fairly smooth. Or, an image might exhibit sharp edges as well as regions of unchanging, or predictable, patterns. In both cases, many sample points are required in certain concentrated intervals or areas where much information can be gleaned from the complex changes occurring there, while few sample points are needed elsewhere where little useful information can be extracted.
The number of sample points required using the standard uniformly spaced samples rises dramatically as the number of dimensions increases. Thus if a one dimensional signal requires O(N) samples, then a two dimensional image would need O(N2) samples, a three dimensional O(N3) 3 samples and so on. (A quantity Q(x) is said to be of order Nk, written O(Nk), if Q(x)/xk is bounded.
What is required is a method to specify the manner of sampling an object wherein each the totality of samples provides comprehensive and non-redundant information about the object, while minimizing the number of samples required, especially as the dimension of the sampled object increases.
The following methods are generally known in the prior art:
Accept the limitations of uniform sampling and utilize Fourier Transform approximation techniques like the Discrete Fourier transform and its FFT implementation;
Approximate, generally via interpolation, non-uniformly spaced samples by uniformly spaced samples of sufficiently high resolution in critical regions tailored to the particular object;
Using pseudo-random numbers, generate an irregular scale for measurements. A Monte Carlo method is typically used to implement this;
Use multi-resolution wavelets; and
Adaptively sample data.
The Fourier Transform plays a dual role in N dimensional data recording. First consider a one dimensional signal representing time variations of some variable of interest, say voltage or price. The Fourier expansion of a signal is capable of representing an analog signal by a sequence of complex numbered Fourier coefficients, the amplitudes and phases of the Fourier components, whereby the content of a signal with a bounded frequency spectrum is accurately represented by a discrete data series. This itself results in effective data compression, especially for signal having significant periodic components. Secondly, the determination of the Fourier coefficients of the signal is determined by sampling of the signal, which brings one back to the need for the present invention.
The standard technique is to employ a Discrete Fourier Transform (DFT) as a substitute for the Fourier Transform. Furthermore the FFT or Fast Fourier Transform has been developed, which is an efficient implementation of the DFT. The DFT, therefore is an approximation that inherently distorts the spectrum generated by the Fourier transform. Nevertheless, it is tolerated as the most commonly used digital and image processing technique because it is analytically tractable and readily implemented on diverse hardware/software platforms.
The approximation, generally via interpolation, of non-uniformly spaced samples by uniformly spaced samples of sufficiently high resolution also distorts the calculated spectrum. But it is also potentially wasteful of sample points since it requires uniformly spaced samples as anchors around which irregularly spaced samples are interpolated. It is also difficult to implement its mathematics. Indeed, the overwhelming general practice is still to use the first approach.
The use of pseudo-random numbers is analytically or mathematically difficult to describe, if not pragmatically intractable. That is, it is not readily describable mathematically, say as a set of equations. A mathematical description is generally needed in practice, to develop algorithms that automatically map sample points to the irregularly spaced random points. It is also computationally complex because the deltas, or differences between each sample point and neighboring random points, generally must be recalculated because the random points used might not be reproducible.
Another, relatively new, general technique is multi-resolution analysis. This involves splitting the stream of possibly non-uniformly spaced data into streams of different resolution. In some circumstances, this technique is also referred to as xe2x80x9cdecimationxe2x80x9d. Wavelets also use a multi-resolution analysis. The use of multi-resolution wavelets tries to solve the problem of mathematically decomposing non-stationary or highly irregular signals and images, into simpler components on multi-tiered scales. But the approach is generally difficult to implement in practice. Although there have been claims for very specific applications, its effectiveness vis-a-via standard DFT based techniques for general applications is not apparent. Indeed, the 1965 Cooley-Tukey FFT algorithm immediately paved the way for the revolution in medical imaging that accelerated in the 1970s. Such a similar impact is not apparent, even more than 13 years after the framework for wavelets was articulated in a coherent manner.
There were other attempts preceding wavelets, at calculating the Fourier transform of non-stationary objects, notably the Gabor Transform. But again, these techniques were not general enough to be of use off-the-shelf, as the FFT. Indeed, a major motivation underlying wavelets was to supplant the Gabor transform method with a multi-resolution approach.
Adaptive sampling has been used in many specialized applications in connection with the FFT. Its use of specialized and sometimes customized sampling algorithms (to decide on how to adapt to data changes) limits its usefulness as a general purpose tool. It requires much a priori knowledge of the specific data set being sampled.
U.S. Pat. No. 5,708,432, appears to disclose a sampling digitizer with a sampling master clock divided by a factor k which is derived from prime integers. The ""432 patent is concerned with sampling a periodic function, where the number of samples desired during each cycle exceeds the response ability of the sampling system. A technique is used in which fewer samples are taken from successive cycles of the signal and are combined in order to simulate the effect of many samples taken during one cycle. To do this, a clock is divided by an integer K which is varied according to an algorithm. N consecutive samples are taken over M cycles of the signal. Coherent signaling is achieved when the signal frequency and the divided clock frequency are relatively prime. The particular algorithm finds the relatively prime fraction by selection from a Farey series of a given order. Unlike the present invention, the ""432 invention selects a set of samples separated by constant interval. The use of relatively prime ratios to guarantee that the same points are not selected on a perfectly periodic signal is different from the present technique, which would work to sample even a pulse, and does not require that the signal be periodic.
An abstract in the prior art makes reference to sampling employing prime numbers. Proceedings the European Signal Processing Conference held in Trieste on Sep. 10-13, 1996.
The abstract states:
We demonstrate that it can be advantageous from a computational point of view to use a two-stage realization instead of a single-stage realization for sample rate conversions with prime numbers. One of the stages performs a conversion by a factor of two . . . . The other stage changes the sample rate by the rational factor N/2 . . . .
U.S. Pat. No. 5,243,343 uses variable time interval data sampling. See col. 5, line 18-19. U.S. Pat. No. 5,444,459 is essentially identical with the ""343 patent.
Other prior art shows
1. non-linear sampling without reference to the prime integer algorithm of the present invention (U.S. Pat. Nos. 4,188,583; 4,142,146; 4,903,021;and 5,229,668 [FIG. 3])
2. the use of prime numbers in Hartley transform processors (U.S. Pat. No. 4,062,060; Boussakta et al., 24 Elect. Ltrs. No. 15, pp. 926-28 (Jul. 21, 1988); 136 IEE Proc. Pt. G, No. 5 (October 1989) pp. 269-77.)
3. Other patents of interest include U.S. Pat. No. 5,712,809 entitled xe2x80x9cMethod and apparatus for performing fast reduced coefficient discrete cosine transformxe2x80x9d; U.S. Pat. No. 5,682,524 entitled xe2x80x9cDatabank system with method for efficiently storing non-uniform data recordsxe2x80x9d; U.S. Pat. No. 5,497,152 entitled xe2x80x9cDigital to digital conversion using non-uniform sample ratesxe2x80x9d; U.S. Pat. No. 4,271,500 entitled xe2x80x9cDevice for converting anon-uniformly sampled signal with short-time spectrum to a uniformly sampled signalxe2x80x9d; U.S. Pat. No. 5,712,635 xe2x80x9cDigital-to-analog conversion using non-uniform sample ratesxe2x80x9d; U.S. Pat. No. 4,999,799 entitled xe2x80x9cSignal processing apparatus for generating a Fourier transformxe2x80x9d; and U.S. Pat. No. 4,969,700 entitled xe2x80x9cComputer aided holographyxe2x80x9d.
4.
The invention relates to signal processing, primarily digital signal processing employing N dimensional data sampling required to extract information about the object for subsequent processing. In order to extract information from a signal it is sampled at discrete points and the sampled values are fed into calculation algorithms. For example the sampled values may be used to calculate the coefficients of a Fourier transform of the signal by known algorithmic techniques, thereby extracting useful information about the Fourier spectrum of the signal. Other transforms may be calculated using other kernel functions to generate Radon, Hartley, or discrete sine and cosine transforms. The discrete points at which sampling occurs may for example be discrete times for a signal that varies with time. As a specific example of the utility of the invention, it is utilized to process an N dimensional Fourier and related transform using the non-uniform samples of the D scale and FFTs. This decomposition of a Fourier transform into a series of FFTs will be detailed later.
This invention is concerned with the selection of the values of the discrete points at which sampling occurs. Since most signals are composed of a few periodic components, there is much redundancy when the sampling takes place at regular intervals. Sampling at regular intervals also leads to artifacting, i.e. false data caused by the confusion of an interval with its harmonics. On the other hand it is algorithmically simpler to sample at regular intervals and it is necessary to keep the calculations manageable.
This invention provides a method including algorithms to generate an irregularly spaced set of sampling points which still provides the practical calculation advantage of uniform sampling. This methodology can also be tuned or configured to generate an approximation of a truly (pseudo)random set of points. Further, it can be tailored such that the resolution or width between neighboring points in the set is always within pre-specified maximum and minimum bounds. That is useful when sampling different data sets, each with a different number of significant digits. For a one dimensional situation, i.e. where the signal is a function of one parameter such as time, the set of sampling points is composed as the union of n sets of points, each set being points separated by regular intervals of 1/pn where pn is a prime number. Each set is denoted a xe2x80x9cscalexe2x80x9d for prime number pn. The union is therefore characterized by a set {Pn} of distinct prime numbers. The union is intrinsically non-periodic. And for even a small set of distinct primes the distribution of points in the union appears almost random. This union is used to determine where the signal is sampled. The points at which sampling takes place is not dependent on the values of the signal. If the signal cannot be accurately reconstructed from the sampled values one merely recurses and increases the set of distinct primes, which is always possible because of their infinite number. Alternately, there might be a-priori knowledge of an object""s information bandwidth. Then the prime numbers used to generate the set of points can be pre-determined, such that the highest prime number just exceeds the known, effective, bandwidth. For example, audio applications generally require bandwidth ranges of 5-20 KHz. Therefore there is no need to use prime numbers of order 105 when this application domain only uses an effective bandwidth of order 104.
The invention may be implemented in a real-time measurement operation by running a series of clocks ticking at intervals characterized by 1/pn for the nth clock, and sampling the signal at each clock tick. For a more than one dimensional situation, where the signal is characterized by multiple parameters, each parameter is treated as in the one dimensional case, and the sampling space is treated as the Cartesian product of one in dimensional spaces. The invention relies upon the incommensurate nature of different primes to prevent the invention from degenerating into an inefficient sampling technique such as using regular intervals would provide.
Although the invention is characterized by the use of primes, there are other techniques that may be used to generate the sampling points. The key features are that the points are derived from the union of sets of points generated by simple calculations and the union avoids a periodic structure. Another variation of this sampling technique is to offset the points associated with each prime scale. The offsets might be random or also prime numbers. The intent would be to better capture phase information in data analysis.
An efficient method is presented for sampling and thereby enabling the recording of, or measurements of objects, which is successful regardless of the irregularity or periodicity of the object. The sampling occurs at values of an independent variable. For example, where the sampled object is an electrical signal, the independent variable could be time. Where the sampled object is a multidimensional physical object, the independent variable could be a spatial coordinate parametrizing the space in which the physical object is embedded. Collectively, the points or values of the independent variable at which samples are taken is termed a xe2x80x9cscalexe2x80x9d. This terminology is consistent with the usual association of a scale with the ruler used to measure quantities.
This new measurement scale is presented with special properties that makes it particularly effective for sampling non-uniform data. The new scale is contrasted with the familiar ruler scale as a point of reference. The new scale is based, fundamentally, on the existing mathematics of set theory, real analysis of the continuum, and number theory. That is, it applies theoretical knowledge from set theory, real analysis and number theory to solve practical problems related to data sampling. The theoretical framework provides the necessary rigorous foundation to guarantee that this new scale yields provably correct results in all applications.
The new scale, herein termed a D scale, is applicable to problems in signal processing, including digital signal processing (DSP), image processing, scientific and engineering computational applications, data acquisition and statistical data analysis, to mention a few application domains.
The D scale can also be configured or tailored to match the varying bandwidth, resolution or accuracy, and point distribution requirements of the widely disparate application domains mentioned above. Specifically, the minimum and maximum resolutions of any particular D scale can be guaranteed by an algorithm, discussed later, that determines which prime scales to include when a D scale is constructed. Further, the D scale can be constructed with prime scale components such that the distribution of its points is made somewhat random for certain applications. Or, the points can be distributed to accentuate the multi-frequency components of an application domain, resulting in more points towards the center of a unit interval, and tapering off at the edges. This flexibility in generating point distributions is effected by including more or fewer prime scales in a D scale.
The invention employs a method, or an algorithm, to perform nonuniform sampling. It can be embodied or implemented as a product (hardware based, software based or combined hardware/software) to implement nonuniform sampling.
Among possible commercial applications are the following:
Real-time Data Acquisition
The D scale, along with its sampling algorithm, improves the efficiency of real-time data acquisition. Data acquisition is at the heart of many processes in such diverse fields as seismology, oceanography, medical imaging and process control, to name a few. The D scale and its algorithm samples judiciously, say when volatile data is detected, and not when the data is uninteresting. In contrast with standard techniques that leads to reduced cost of data acquisition since fewer sample points are generally used. The effectiveness of the sampling process is improved since useful samples are generated, not the many extraneous samples of a uniformly spaced procedure.
Post-acquisition Data Analysis
The D scale can also be used to analyze data already generated by a data acquisition or sampling device. For example, the D scale is used to generate a closed form expression for the Fourier Transform (described later). Using the closed form expression, it is then possible to quantify the bias in not using points in regions that are not varying enough to warrant attention. Therefore, it can effectively filter the acquired data or samples to focus on the more significant data sections, thereby reducing the quantity of data that must be processed to characterize the object sampled. In this manner the D scale acts as a means for data compression. This has application in the analysis of large quantities of data produced by NMR or tomography devices, and reduces the burden of analyzing such data. The same is true of the large quantity of data produced by seismography measuring apparatus, as an example. Further, other application domains produce resonance shapes which might contain only 10% useful information content.
Transforming Irregular Data into Periodic Components
The D scale can be used to partition an original, irregularly patterned data set to create multiple components. Each component contains points which, although not periodically patterned, are all reciprocal multiples of the same prime number 1/pk, which is what bounds them to the same prime scale. Then, because the points on each prime scale is bandlimited by pk, the set of points on each prime scale can be decomposed via standard Fourier analysis into periodic components. Thus, the D scale enables a two stage decomposition. The first stage decomposes the original set of irregularly spaced points into sets of uniformly spaced and bandlimited points, each associated with a prime scale. Then the second stage decomposes each bandlimited decomposition into the usual periodic components of a Fourier series. Also, each partition has a different resolution (1/pk)such that the components form a nested set of resolutions.
Characterization of Noisy data
The D scale can filter random noise into periodic components thereby providing a useful characterization of the data, which reduces the effect of random noise in the data.