TTS software is well known. Typically, a TTS engine is capable of decoding or interpreting a simple text or word-processor originated document (e.g. “.txt”, “.doc” etc.) and converting what is essentially a binary representation of the text into an alternate binary representation in the form of instructions to a sound processor which ultimately delivers the appropriate electric signals to a conventional loudspeaker. The interpretation of the original text document, regardless of whether this is discrete in that it only contains a short phrase or name, or whether it is more expansive and contains one or more paragraphs of text, may typically involve analysis at a granular level, e.g. consonants, vowels and syllables, and may also include grammar and punctuation analysis such that the resulting synthetic speech produced with the correct inflections and intonations and thus sounds as realistic as possible.
In general, there are two methods of synthesizing speech using electronic hardware and software. In concatenative synthesis, synthesized speech is created by concatenating pieces of pre-recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores only smaller phones or diphones will provide the largest output range, but may lack clarity whereas the storage of entire words or sentences allows for high-quality output. Alternatively in formant synthesis, a synthesizer incorporates a model of the vocal tract and other human voice characteristics to create a completely “synthetic” voice output. Parameters such as fundamental frequency, voicing, and noise levels are varied over time to create a waveform of artificial speech. This method is sometimes called rules-based synthesis; however, many concatenative systems also have rules-based components.
One of the most common usages of speech synthesis since its inception has been to allow blind or partially sighted people to comprehend the written word. More recently, a plethora of modern devices, indeed any device with relatively modest processing power and memory such as Personal Digital Assistants (PDAs), more advanced mobile phones such as so-called smart-phones, games consoles, and in-car satellite navigation systems (SNS) allow some facility for either playing pre-recorded snippets of human voices, or executing TTS software for interpreting any text or word-processed document instantly stored on the device.
This invention has particular application to in-car SNS devices, and although the following description is almost exclusively directed thereto, it will be readily appreciated by the skilled reader that the application of the invention may be of far wider scope, and should not be considered limited by the specific description.
In-car SNS devices have become widespread in the previous 5 or so years, and most devices include both one or more map databases for particular countries, and a capacity for storing a number of pre-recorded phrases, possibly in a variety of different voices, e.g. male, female, and at differing pitches or with different levels of gravitas or jollity. Furthermore, many devices also permit the user to record such phrases in his or her own voice, and there may be a simple routine in the operating software of the device to instruct the user to consecutively record each and every phrase which is required for the correct operation of the device. For instance, the user may be asked to record a variety of different phrases or spoken word snippets such as “Turn Left”, “Turn Right”, “After 400 metres”, etc., and once the recording is complete, the operating software of the device ensures that the user's voice snippets are selected for playback at the appropriate time, as opposed to the default or previously selected pre-recorded snippets. Such technology has been available in terms of mobile phones for some time, albeit on a simpler basis, wherein a user may record their own voice and substitute this recording for the default ringtone of the device when a particular person, or indeed any person makes a call to the mobile phone.
The above pre-recorded systems are generally more than adequate for the majority of route navigation operations, but limited in that they do not offer the facility for audible identification of non-standard or country-specific information.
It is therefore an object of this invention to overcome this disadvantage, and provide a more comprehensive audio solution for, among other devices, in-car SNS.