When making acoustic recordings often multiple sound sources are present simultaneously. These can be different speech signals, noise (e.g. of fans) or similar signals. For further analysis of the signals it is useful to separate these interfering signals. Separation of signals can be used, for example, for speech recognition or acoustic scene analysis. Harmonic signals can be separated in the human auditory system based on their fundamental frequency. See A. Bregman. Auditory Scene Analysis. MIT Press, 1990, which is incorporated by reference herein in its entirety. Note that a speech signal in general contains many voiced and hence harmonic segments.
In conventional approaches the input signal is split into different frequency bands via band-pass filters and in a later stage, for each band at each instant in time, an evidence value in the range of 0 and 1 for this band to originate from a given fundamental frequency is calculated, where a simple unitary decision can be interpreted as using binary evidence values. By doing so a three dimensional description of the signal is obtained with the following axes: fundamental frequency, frequency band, and time. A similar kind of representation is also found in the human auditory system. See G. Langner, H. Schulze, M. Sams, and P. Heil, The topographic representation of periodicity pitch in the auditory cortex, Proc. of the NATO Adv. Study Inst. on Comp. Hearing, pages 91-97, 1998, which is incorporated by reference herein in its entirety. Based on these beforehand calculated evidence values, groups of bands with common fundamental frequency can be formed. Hence in each group the harmonics emanating from one fundamental frequency and therefore belonging to one sound source are present. By this means the separation of the sound sources can be accomplished.
A crucial step in the separation of sound sources is determining whether two harmonics emanate from a common fundamental frequency and hence from a single sound source. In conventional approaches this is done via the auto-correlation function. See G. Hu and D. Wang, Monaural speech segregation based on pitch tracking and amplitude, IEEE Trans. On Neural Networks, 2004, which is incorporated by reference herein in its entirety. For each frequency band the auto-correlation is determined and frequencies being in a harmonic relation will share peaks in the lag domain. Hereby also a peak occurs at the lag corresponding to the frequency of the harmonic and multiples of this lag. Further, biological principles for sound source separation are also known. See B. Moore An Introduction to the Psychology of Hearing. Fifth Edition, Academic Press, 2003, which is incorporated by reference herein in its entirety. However, conventional techniques do not provide high precision and are unable to identify when signals do not emanate from one common fundamental but are only coincidentally close to a harmonic relation.
What is needed are more efficient techniques for separating signals from different sound sources.