Some embodiments relate to speech extraction, and more particularly, to system and methods of speech extraction.
Known speech technologies (e.g., automatic speech recognition or speaker identification) typically encounter speech signals that are obscured by external factors including background noise, interfering speakers, channel distortions, etc. For example, in known communication systems (e.g., mobile phones, land line phones, other wireless technology and Voice-Over-IP technology) the speech signals being transmitted are routinely obscured by external sources of noise and interference. Similarly, users donning hearing-aids and cochlear implant devices are often plagued by external disturbances that interfere with the speech signals they are struggling to understand. These disturbances can become so overwhelming that users often prefer to turn their medical devices off and, as a result, these medical devices are useless to some users in certain situations. A speech extraction process, therefore, is needed to improve the quality of the speech signals produced by these devices (e.g., medical devices or communication devices).
Additionally, known speech extraction processes often attempt to perform the function of speech separation (e.g., separating interfering speech signals or separating background noise from speech) by relying on multiple sensors (e.g., microphones) to exploit their geometrical spacing to improve the quality of speech signals. Most of the communication systems and medical devices previously described, however, only include one sensor (or some other limited number). The known speech extraction processes, therefore, are not suitable for use with these systems or devices without expensive modification.
Thus, a need exists for an improved speech extraction process that can separate a desired speech signal from interfering speech signals or background noise using a single sensor and can also provide speech quality recovery that is better than the multi-microphone solutions.