There are many applications where it is desirable to determine an aspect of spoken sounds. This aspect may include identifying a language being spoken, identifying a particular speaker, identifying a device, such as a helicopter or airplane and a type of the device, and identifying a radar signature, for instance. For instance, a user may have a tape recording of information, which the user needs to understand. If this information is in a foreign language, it may be required to be translated. However, without knowing what language the information is in, it will be difficult for the user to choose a proper translator.
Similarly, it may be useful, when processing tape recordings, to determine who is the speaker at any particular time. This will be especially useful in making transcripts of a recorded conversation, when it may be difficult to determine who is speaking and at what time.
It is well known that all language is made up of certain phonetic sounds. The English language, for example, has thirty-eight phonetic sounds that make up every single word. In average English continuous speech, there are approximately ten phonetic sounds which are uttered every second. Other languages are composed of other phonetic sounds.
Prior techniques for recognizing languages have attempted to identify a number of these phonetic sounds. When a determined number of phonetic sounds are identified, a match to the particular language which has these phonetic sounds is established. However, this technique takes a long time to determine the proper language, and may allow errors in the language determination.
The inventor of the present invention has recognized that one reason for this is certain phonetic sounds are found in more than one language. Therefore, it would take a very long time to recognize any particular language, as many of the phonetic sounds, some of which are infrequently uttered, will have to be recognized before a positive language match can be determined.
The present invention makes use of this property of languages in a new way which is independent of the actual phonetic sounds which are being uttered.