The present disclosure generally relates to performing speech recognition on board of an aircraft. More particularly, the present disclosure relates to a method for performing speech recognition on board of an aircraft, to a computer program for executing the method, as well as to a speech recognition unit for performing speech recognition on board of an aircraft.
Speech recognition—also known as “Speech-to-Text” (STT)—techniques have been developed over the last decades to provide computer-implemented assistance for translating spoken language into text and have nowadays been adopted in many fields as an effective means to improve work efficiency.
Speech recognition systems may generally be classified into speaker-independent and speaker-dependent systems. Speaker-independent systems are typically usable out-of-the-box and do not require performing user-based training before they are ready to use. These systems generally support a limited vocabulary only. Speaker-dependent systems, on the other hand, require performing user-based training before their effective use. In such training, a user's specific voice is analyzed and used to fine-tune the recognition of the particular user's speech, finally resulting in a more accurate transcription. Speaker-dependent systems generally support large vocabularies suitable for translating spoken natural language into full text.
Results of a user-based training may be stored into a user profile which may include, for example, voice and/or pronunciation characteristics of a particular user, a vocabulary characteristic to the particular user as well as probabilities of occurrences of words in the language commonly used by the user.
In speaker-dependent speech recognition systems, the user is required to select a user profile before the actual translation of spoken language may begin. Due to the information stored in the user profile, speech recognition may be performed taking into account the user-specific characteristics which finally improves the recognition rate, i.e., the percentage of correctly recognized words from the speech signal.
“Speech recognition” is generally to be distinguished from “speaker recognition.”
Speaker recognition relates to the identification of a person from characteristics of the person's voice. Speaker recognition systems may be used for speaker verification or speaker identification. In speaker verification, the voice of a user who claims to be of a certain identity is used to verify the claimed identity. In speaker identification, on the other hand, a user's voice is used to determine a previously unknown identity of the user. Roughly speaking, therefore, speech recognition relates to recognizing “what” is being said and speaker recognition relates to recognizing “who” is speaking.
Similar to the user-based training applied in speaker-dependent speech recognition systems, speaker recognition systems typically enforce a so-called enrollment phase. During enrollment, the user's voice is recorded and a number of features are extracted to form a voice print. During verification, a speech sample is then compared against previously created voice prints.
Speaker recognition systems may be classified into text-dependent systems, for which the text for enrollment and verification is the same (e.g., given by a common pass phrase), and text-independent systems, for which the text for enrollment and verification is generally different and the user's identity is thus determined based on common voice analysis techniques.
In both speech recognition and speaker recognition, various techniques may be used to process and store voice and pronunciation characteristics of a user including frequency estimation, Hidden Markov models, neural networks, pattern matching algorithms, Gaussian mixture models, or the like.