Many organizations or organizations' units that mainly handle interactions, such as call centers, customer relations centers, trade floors, law enforcements agencies, homeland security offices or the like, receive interactions in multiple languages. For example, a call center may provide services to customers speaking English, French or Spanish. In some cases the language in which the customer speaks is known, for example from an Interactive Voice Response (IVR) system, or according to the called number. In other cases, however, the language is not a-priori known. Knowing the language used by a customer, a supplier, or another person such as an employee or a person providing outsourcing services to an organization (for example a freelance delivery person working with a delivery company) may enable to transfer the call to a person who speaks the language. Alternatively, the call or a recording thereof may be transferred to an appropriate automated system for handling or for analysis the calls. For example, in order to categorize calls into categories characterized by keywords, it is important to identify the language spoken in the call, otherwise the extracted words may be altogether wrong, and so is the categorization of the call, or further processing of calls in the category.
Known techniques for language identification include acoustic language identification, i.e., matching speech samples against one or more acoustic models of the environment in order to identify the spoken language. However, acoustic language identification has relatively high error rate, which can even get to 60%. Generally, the more languages the system has to identify, the higher the error rate. When the error rate is high, for example around 60% and there are only two languages, it may be better to choose arbitrarily the language rather than to activate the automatic language identification system. Acoustic language identification improves as the acoustic model is constructed upon larger corpus, but constructing such a corpus, for which the language is known, is labor intensive.
Another known group of techniques relates to textual language verification, i.e., verifying whether an audio segment is in a particular language, once the text of the audio is provided. When activating a speech to text (S2T) engine of the wrong language, for example when trying to perform English speech to text on audio in which Spanish is spoken, the results are usually poor, and the resulting text is easily identified as meaningless and having high probability that it does not result from English utterance. However S2T engines and textual verification consume significant processing resources, including CPU, memory and time, and are thus non applicable to environments in which the spoken language is a-priori unknown and multiple tests may be required.
There is thus a need in the art for a method and apparatus for language identification, in order to improve and accelerate processes in a call center, such as directing a call, categorizing an interaction or further processing of an interaction. The method and apparatus should provide low error rate, and low requirement for manual labor or processing resources.