In the prior art as shown in FIG. 1, a signal processing system 100 is generally modeled as follows. A dynamic system 110 generates a primary signal 111. The primary signal 111 as used herein is a dynamic time series, e.g. human speech.
The primary signal 111 is subject 120 to a corrupting and additive secondary signal 121, e.g., stationary random, white or Gaussian noise, to produce a combined signal 122. Because the noise “looks” the same at any instant in time, it can be considered “stationary.” The problem is to substantially recover the primary 111 signal from the combined signal 122.
Therefore, in the prior art, the combined signal 122 is measured to obtain samples 130. An estimate 141 of the stationary noise is determined 140 based on an understanding or model of the dynamic system 110 that generated the primary signal 111, i.e., the speech signal. The estimated noise 141 is then removed 150 from the samples 130 to recover the primary signal 111 having a reduced level of noise.
The prior art model 100 assumes that the noise in the combined time series data 122 is the output of some underlying process. The nature or the parameters of that process may not be fully known, therefore, it is generally modeled as a random process.
Additional formulations represent what is known about the underlying primary signal. The dynamic systems 110 represent a convenient tool for such representations of the primary signal because dynamic systems can accommodate arbitrarily complex processes, diverse sources of information, and are amenable to standard analytical tools when simplified to suitable forms.
A conventional approach to estimating 140 the noise 141 affecting the combined signal 122 is to model the speech signal as an output 111 of the dynamic system 110, such as a hidden Markov model (HMM), and to estimate 140 the noise 141 based on variations of the measured signal 130 from typical output of the known underlying system 110.
Tracking dynamic systems with a continuum of states in an analytical manner becomes difficult when conditional densities of the combined signal 122 are mixtures of many component densities. Unfortunately, this is the case in most real-world systems where speech is subject to both stationary noise, and dynamic or non-stationary noise, e.g., background conversation, music, environmental acoustics, traffic, etc. This analytical intractability is primarily due to two conditions.
First, the complexity of the estimated distribution for the state of the system, as measured by the number of parameters in the system, increases exponentially over time. In addition, when the relationship between the measured output and the true output of the system is non-linear, the estimated state distributions may not have a closed form. Both of these problems are encountered in continuous-state dynamic systems used to estimate time series data.