Embodiments of the present invention generally relate to speech processing. More specifically, embodiments of the present invention relate to processing a signal representing speech based on occurrence of events within the signal.
Various techniques for electronically processing human speech have been and continue to be developed. Generally speaking, these techniques involve reading and analyzing an electrical signal representing the speech, for example as generated by a microphone, and performing processing thereon such as trying to determine the spoken sounds represented by the signal. The spoken sounds are then assembled to replicate the words, sentences, etc. that are being spoken. However, such electrical signals created by human speech are considered to be extremely complex. Furthermore, determining exactly how such signals are interpreted by the human ear and brain to represent intelligible words, ideas, etc. has proven to be rather challenging.
Previous techniques of speech processing have sought to model the process performed by the human ear and brain by analyzing the entirety of the electrical signal representing the speech. However, the previous approaches have had somewhat limited success in accurately recognizing or replicating the spoken words or otherwise processing the signal representing speech. The previous techniques of speech processing have sought to improve accuracy by increasingly adding complexity to the algorithms used to process the spoken sounds, words, etc. However, as the resource overhead of these systems continues to grow, the improvements in accuracy and/or fidelity of speech processing systems seems to not improve to a corresponding level. Rather, various speech processing systems continue to evolve that require more and more resource overhead while providing only marginal improvements in accuracy, fidelity, etc. Hence, there is a need in the art for improved methods and systems for speech processing.