The ability to recognize a voiced sound pattern (e.g., a keyword or a phrase), as vocalized by a particular speaker, is a basic function of the human auditory system. However, this psychoacoustic hearing task is difficult to reproduce using previously known machine-listening technologies because spoken communication often occurs in adverse acoustic environments that include ambient noise, interfering sounds, and background chatter of other speakers. The problem is further complicated because there is often some variation in how a particular speaker vocalizes multiple instances of the same voiced sound pattern (VSP). Nevertheless, as a hearing task, the unimpaired human auditory system is able recognize VSPs vocalized by a particular speaker effectively and perceptually instantaneously.
As a previously known machine-listening process, recognition of a VSP as vocalized by a particular speaker includes detecting and then matching a VSP to the vocal characteristics of the particular speaker. Known processes that enable detection and matching are computationally complex, use large memory allocations, and yet still remain functionally limited and highly inaccurate. One persistent problem includes an inability to sufficiently train a detection and matching system using previously known machine-listening technologies. For example, previously known technologies are limited to using a single vocalization instance in a training process, because the processes employed cannot jointly utilize multiple vocalization instances without excessive multiplicative increases in computational complexity and memory demand. However, a single vocalization instance does not provide a sufficient amount of information to reliably train a VSP detection module.
Moreover, due to the computational complexity and memory demands, previously known VSP detection and speaker matching processes are characterized by long delays and high power consumption. As such, these processes are undesirable for low-power, real-time and/or low-latency devices, such as hearing aids and mobile devices (e.g., smartphones, wearables, etc.).