According to Wikipedia: Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.
Since invented, speech technology constantly improved it's abilities. Most efforts where around imitating a human voice and fluently reading while the user interface and text navigating abandoned. From the user point of view, it is still complicated to use since current common user interfaces are limited, for example:
The existing products/applications are far from being comfortable for the end users.                a. In most cases, the user needs to select, by marking the text before listening to it.        b. If the user stops in the middle of reading, playing text again will start from the beginning of the marked text.        c. During reading there are no text pointers and the users lost their orientation very quickly.        d. Not using device specific input methods and apparatuses such as touchpad's, touch and multitouch screens making navigation easier and more intuitive.        e. reading large amounts of content are almost impossible.        f. Current audio books navigation is cumbersome        
There is a need in the art to provide new controls for text to speech navigation and reading orientation by adding new orientation abilities that will enable easy navigation through large documents, and will help readers to follow the text as it is being read by the TTS engine.
There is a need in the art to provide a solution that will work on any device Mac/PC, Mobile Smartphone or Tablets by touch, voice, mouse or keyboard.
According to Wikipedia: A text-to-speech (TTS) system (or “engine”) is composed of two parts: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound.
In one embodiment of the present invention the engine will provide portrayed text indications every time a new sentence or a new word or a new character collectively referred hereunder as the “text” is being output by the back-end. Based on these indications the system will mark for example but not limited to portraying a magnifying glass over text being read, providing the user with orientation of the current text being read.
In a second embodiment of the present invention the engine will provide portrayed line indications every time the text being read, where that text is the next line or in the previous line relatively to the text that was read immediately before the current text. A line indication can be for example portraying a small needle at the beginning of the line that is currently being read.
In a third embodiment of the present invention the user may click, double click, drag, use a single touch or a multitouch gesture applied on over the portrayed text indicator in order to start or stop playback of the TTS engine.
In a fourth embodiment of the present invention the user may drag, use a single touch or a multitouch gesture applied on the portrayed text indicator in order to set a new reading point for playback of the TTS engine.
In a fifth embodiment of the present invention the user may drag, use a single touch or a multitouch gesture applied on the portrayed text indicator in order to set a new reading point for playback of the TTS engine. Where said reading point is not in the same page of the book.