The present invention relates generally to the field of natural language processing, and more particularly to computational linguistics.
Speech recognition is a sub-field of computational linguistics that develop methodologies and technologies that enable recognition and translation of spoken language into text by computers. It is also known as “automatic speech recognition” (ASR), “computer speech recognition,” or just “speech to text” (STT). Some speech recognition systems require “training” (also called “enrollment”) where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called “speaker independent” systems. Systems that use training are called “speaker dependent.”
Speech recognition applications include voice user interfaces, such as voice dialing (e.g., “Call home”), call routing (e.g., “I would like to make a collect call”), domotic appliance control, search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input).
The term voice recognition or speaker identification refers to identifying the speaker, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on a specific person's voice, or it can be used to authenticate or verify the identity of a speaker as part of a security process.