This invention relates to time-frequency directional processing of audio signals.
Use of spoken input for personal user devices, including smartphones, automobiles, etc., can be challenging due to the acoustic environment in which a desired signal from a speaker is acquired. One broad approach to separating a signal from a source of interest using multiple microphone signals is beamforming, which uses multiple microphones separated by distances on the order of a wavelength or more to provide directional sensitivity to the microphone system. However, beamforming approaches may be limited, for example, by inadequate separation of the microphones.
A number of techniques have been developed for unsupervised (e.g., “blind”) source separation from a single microphone signal, including techniques that make use of time versus frequency decompositions. Some such techniques make use of Non-Negative Matrix Factorization (NMF). Some techniques have been applied to situations in which multiple microphone signals are available, for example, with widely spaced microphones.
An approach used for speech processing, for example speech recognition, makes use of some processing capacity at a user's device along with transmission of the result of such processing to a server computer, where further processing is performed. An example of such an approach is described, for instance, in U.S. Pat. No. 8,666,963, “Method and Apparatus for Processing Spoken Search Queries.”