Speech recognition typically refers to a process whereby an appropriately programmed computer system or circuit can receive an input in the form of speech, such as an audio recording, and output text data that corresponds to the words being spoken in the input representation. Speech recognition might involve determining, guessing, and/or estimating what words a speaker is speaking when the words being spoken are not known to the computer system. Speech recognition is useful for creating captioning for video, making recorded audio and video keyword searchable by the words spoken in the recording, automated transcription, and other uses. Typically, a speech recognition system has a stored model of speech that it uses to assess what words might have been spoken and to resolve the input speech into the words that were spoken. Typically, that stored model is generated using some speech learning process.
Speech learning describes a process in which a computer system processes a recording of a speaker, knowing the words that the speaker is speaking, and builds a computer model that can be used for speech recognition and similar tasks. The processing is sometimes referred to as a training process. Once a computer system is “trained,” that computer system might be expected to convert spoken speech into text data or other representation of sequences of words. Speech learning is useful in making speech recognition more accurate, more efficient, and the like.
Audio systems, commonly used in the art, generally require speech recognition training, speech learning, or other types of teachings in order for the audio system to function adequately. Such systems may require many forms of training for each different user before being deployed and available for use, which requires a mass of data collection such as batch training, where a new user speaks a known sequence, and a system analyzes that to determine phonemes and accents.