Spoken dialogue systems are regularly used in a wide range of technical fields, for example, in mobile telephone devices to process the user's speech and transmit it to a receiving communications device. Spoken dialogue systems commonly employ a combination of speech synthesis and speech recognition techniques. Speech synthesis techniques are often used in entertainment productions such as video games and animated files, but it can also be a useful tool in assistive technology for people with a range of disabilities, for example, visual and speech impairment, and dyslexia.
A common speech synthesis technique is a text-to-speech (TTS) system, wherein raw text is analysed, converted into phonetic representations and then converted via wave form generation into speech. However, TTS can experience problems identifying the correct pronunciation for certain aspects of text, for example, numbers, abbreviations, and spellings that have different pronunciations depending on the context of the word. As a result, TTS can be a lengthy and complex process, and therefore, it is often necessary to know the text well in advance of the speech synthesis. In view of this, TTS systems are not suitable for real-time voice conversion.
The treatment of auditory hallucinations have been reported by a number of media publications. News articles published by the BBC and ‘The Guardian’ newspaper both discuss the possibility of using virtual reality (VR) techniques to produce an ‘avatar’ that represents the ‘voice’ heard by schizophrenic patients in order to help them gain control of their hallucinations: A further BBC news article reports how the technology used in the clinical trial is able to tune the voice of the virtual avatar, supplied by the therapist, to match the voice of the patient's auditory hallucination.