The modern communications era has brought about a tremendous expansion of wireline and wireless networks. Computer networks, television networks, and telephony networks are experiencing an unprecedented technological expansion, fueled by consumer demand. Wireless and mobile networking technologies have addressed related consumer demands, while providing more flexibility and immediacy of information transfer.
Current and future networking technologies continue to facilitate ease of information transfer and convenience to users. One area in which there is a demand to increase ease of information transfer relates to the delivery of services to a user of a mobile terminal. The services may be in the form of a particular media or communication application desired by the user, such as a music player, a game player, an electronic book, short messages, email, etc. The services may also be in the form of interactive applications in which the user may respond to a network device in order to perform a task, play a game or achieve a goal. The services may be provided from a network server or other network device, or even from the mobile terminal such as, for example, a mobile telephone, a mobile television, a mobile computer, a mobile gaming system, etc.
In many applications, it is necessary for the user to receive audio information such as oral feedback or instructions from the network or mobile terminal or for the user to give oral instructions or feedback to the network or mobile terminal. Such applications may provide for a user interface that does not rely on substantial manual user activity. In other words, the user may interact with the application in a hands free or semi-hands free environment. An example of such an application may be paying a bill, ordering a program, requesting and receiving driving instructions, etc. Other applications may convert oral speech into text or perform some other function based on recognized speech, such as dictating SMS or email, etc. In order to support these and other applications, speech recognition applications, applications that produce speech from text, and other speech processing devices are becoming more common.
Devices that produce speech from computer readable text, such as text-to-speech (TTS) devices typically analyze text and perform phonetic and prosodic analysis to generate phonemes for output as synthetic speech relating to the content of the original text. However, since such devices are used all over the world and further since many geographic locations are becoming increasingly diverse in terms of the languages spoken by local inhabitants, texts involving various languages may be encountered. As such, a direct phoneme to synthetic speech conversion of the text may suffer from inaccuracies or fail to sound natural. Current mechanisms directed to curing the deficiencies above may require large amounts of text in order to function properly and are therefore inflexible.
Accordingly, it may be desirable to provide flexible language identification for input data to ensure that an appropriate language model is utilized in performing text to speech conversions.