When designing a multilingual speech application with automatic classification of the language (“language identification”) used by the user, the question arises as to how the user can be motivated to speak freely in his or her mother tongue. As will be described in the following section, various methods exist for classifying spoken language, each of which is based on the sole use of a single method for processing the acoustic and/or phonetic speech stream. Since these technologies have their main application in the field of speech dialogue systems, other speech processing technologies, such as speech recognition and/or speaker verification, which serve for recognizing spoken words and/or sentences or for verifying or identifying the speaker are also essentially available. Furthermore, speech dialogue systems may also have additional user recognition-related data which can explicitly or implicitly impart additional information about the language spoken (e.g. German, English, French, Spanish, Turkish, etc.) that may be made used by a particular user of a speech dialogue application. The separate use of technologies for classifying spoken language is still partly error-prone. An important aim must therefore be to reduce this error rate, possibly through the combined application of other technologies.
A general structure of dialogue systems that are used for linguistic and, similarly, multimodal interaction with at least one user is known. Monolingual speech dialogue applications fulfill the requirements of a multilingual client circle only incompletely. For this reason, speech dialogue applications are being developed that determine the language spoken using the technology of language identification based on an utterance by the caller in order to switch speech output and speech recognition grammars to the relevant language, if possible, directly following the first utterance. For this purpose, the user must be notified that the possibility exists of also using the application in a different language from the base language.
A common procedure for notifying a user of this possibility of using a language application in a language other than the main language is extending an introductory speech output prompt by adding appropriate informative statements. Following the greeting, for example, the option may be put to the user as follows: “To use the service in English, say English; pour le service en Français, dites Français; um den deutschen Dienst zu nutzen, sagen Sie Deutsch; . . . ” Azevedo, J./Beires, N./Charpentier, Francis/Farrell, M./Johnston, D./LeFlour, E./Micca, G./Militello, S./Schroeder, K. (1998): “Multilinguality in voice activated information services: the P502 EURESCOM project”, In IVTTA'98, 49-54. Depending on the answer given by the user, the dialogue application is switched to the relevant language without further use of speech recognition technologies.
If the technology of language identification is used, the user no longer has to give the designation of the language explicitly, but can directly answer a question which corresponds to the aim of the dialogue application in his mother tongue. The formulation of this question should contain an indication of this possibility.
The speech output can be generated either with the aid of recorded speech or by speech synthesis (text-to-speech). Modern text-to-speech systems also include methods for reproducing sequential mixed-language texts in sequence acoustically, adapting the pronunciation to phonetic characteristics of different languages (“mixed language”).
Language identification (L-ID) usually takes place immediately following the first output prompt, based on the first utterance of the caller. The most important point is therefore to motivate the caller to make his first utterance in his mother tongue.
Various methods of language identification are described in:                Muthusamy, Y. and Spitz, A. (1996). Automatic Language Identification. In Cole, R., Mariani, J., Uszkoreit, H., Varile G., Zaenen, A., Zompolli, A., and Zue, V. (edtrs.): Survey of the State of the Art in Human Language Technology. Cambridge University Press (pp. 273-276);        Zissman, M. (1996) Comparison of Four Approaches to Automatic Language Identification of Telephone Speech. IEEE Transactions on Speech and Audio Processing., 4(1);        Y. K. Muthusamy, E. Barnard, and R. A. Cole (1994), “Reviewing automatic language identification,” IEEE Signal Processing Mug., vol. 11, no. 4, pp. 3341;        Matejka P., Szöke, I., Schwarz, P., and Cernocky, J. (2004). Automatic Language Identification using Phoneme and Automatically Derived Unit Strings. Lecture Notes in Computer Science., 2004 (3206), 8;        Nagarajan T., and Murthy, H. (2004). Language identification using parallel syllable-like unit recognition. In Proceesings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'04). Montreal, Canada (pp. 401-404); and        Ramus, F., Nespor, M., and Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition., 73.Following the utterance by the caller, the language identification system usually detects which language was spoken. If one of the possible foreign languages has been positively identified, then the required language-dependent settings of the dialogue application are automatically carried out and a changeover is made into the relevant dialogue language.        
Also known are methods in which speech recognition and or language identification are not only used in real time on an acoustic speech stream, but are applied to a buffer region with digitized acoustic information in the form of memory regions or files.
It is essentially possible to activate the grammar for a speech recognizer and a speech dialogue in a plurality of languages simultaneously, so that answers can also be recognized simultaneously in a plurality of languages. However, where vocabularies are relatively large, this also leads to an increase in the possibilities for confusion between individual elements of the grammar, so that multilingual grammars can only be used in selected cases.