Current computer programs called “screen readers” use text-to-speech software to “read” the text displayed on a computer screen. (One example is the JAWS screen reader program, available from A.D.A. WorkLink, Berkeley, Calif. Another is Microsoft's Narrator accessibility software built into Windows 2000.) Some have been adapted for or incorporated into web browsers, in order to “read” web pages or e-mail. Because this class of software has generally been designed for the blind or visually impaired, the reader must also provide aural signals of important non-text information, such as symbols, non-standard punctuation, and a description of pictures embedded in the text. When the screen reader is intended to read web pages, the screen reader also has to describe animations or videos, and signal when a “button” or “link” can be activated, as well as what the button does and where the link navigates. To do this, the screen reader “parses” the digital code that makes up the text and formatting instructions for the page. The actual text is put in the proper form for the text-to-speech software without the extra formatting codes needed for page display (e.g., margins, italics, etc.). Some of the formatting codes cause the parsing program to insert additional code for the text-to-speech reader. For example, formatting code to place a word in boldface might be changed to add code that makes the text-to-speech program speak that word louder. In other instances, the parsing program inserts words to describe what the formatting code sought to accomplish. For example, an image tag in a web page may include not only the source of the image, but a textual description of what the image is or shows (the text following the “alt” tag). A screen reader would then indicate through aural tones, or spoken words that the page contained an image, and the screen reader would speak the description of image. Similarly, a screen reader that encounters a hyperlink would indicate that an image or text is acting as a link in addition to reading the text or describing the image using the alt tag text. The screen reader might even read the address of the page to which the hyperlink links. (This is information that a sighted person would see on the browser's status line when the cursor is placed over the link.)
Some screen readers have also been developed as reading aids for the sighted, particularly sighted persons who have difficulty learning to read. Two examples are the CAST eReader, available from CAST, Peabody, Mass., and the HELPRead™ plug-in, available from the Hawaii Education Literacy Project (HELP), Honolulu, Hi.
The CAST eReader will read documents or web pages. The user places the cursor focus in front of the text on a document that he or she wants the eReader to read. This is performed by placing the cursor at that location and then clicking the left mouse button. The eReader will then read the next letter, word or sentence (depending upon user settings, however, for web pages, only whole sentences are read). As the eReader vocalizes the text, it will “highlight” the letter, word or sentence being read (depending upon user settings, however, for web pages, only words are highlighted). (When a word is “highlighted” its background shows a different color as if it had been highlighted by a magic marker.) The eReader can read one piece of text at a time, or automatically continue through an entire document. The user can also highlight a portion of text (by pointing and clicking with a cursor), and then click on a button for the eReader to read that text. The eReader can also be automatically set to begin reading from the top any web page it encounters.
The HELPRead plug-in has a different interface but performs similar functions: user identification of text to be read by point-and-click or by highlighting, and highlighting text while it is being read. The HELPRead plug-in will also read any text placed in the clipboard.
Both of these readers are either fully automated reading from top to bottom of a document, or they require a double step point-and-click.
There are other current uses for such parsing routines. Some websites for translation services allow the user to specify the address of a web page, and then parse that entire page, translating all text, but not translating the formatting code, and causing the translated page to appear in the user's web browser, with the same or similar formatting, images, typeface, etc. as the original web page. (An example is the www.systransoft.com website of Systran S.A., France/Systran Software, San Diego, Calif.) However, unlike the previous example, the parsing is done at the translation website's server, rather the user's computer.
Some “portal” websites like Octopus (Octobus.com, LLC, Palo Alto, Calif.) allow the user to create a personalized web page, by identifying other web pages and specifying material in that other web page. When the user next visits Octopus, Octopus in the background creates the personalized web page for the user by parsing those other websites for the requested information and reconstituting it on an Octopus page, before delivering it to the user.
Text-to-speech software has also been adapted as plug-ins for Internet browsers. These may be stand-alone speech synthesis programs, or may be coupled with an animation program, so that a “cartoon” will appear to speak the words. Two such programs are the Haptek Virtual Friend animation program (available from Haptek, Inc., Santa Cruz, Calif.) which in February 2001 was coupled with DECtalk text-to-speech program (available from Fonix Corporation, Draper, Utah) and the Microsoft Agent animation program which is frequently coupled with the Lernout & Hauspie TruVoice text-to-speech program. (Apple computer also has a text-to-speech program called PlainTalk.) These various plug-ins can be accessed from web pages that have embedded the appropriate code, causing certain predesignated portions of the web page to be spoken. The web page designer/creator decides which portions of the web page will “talk”.
An authoring application that helps web designers use Microsoft Agent is Buddy Builder by Shelldrake Technologies, Concord, N.H. A web page that uses this software includes a link, that when activated, launches a new browser window. The new browser window displays a modified version of the web page. This web page will “speak” when the browser registers various events (e.g., onLoad, onMouseover, onClick) with respect to specific page elements. This program only speaks certain page elements previously designated by the web page author.
Prior to Feb. 26, 2001, the Simtalk website (www.simtalk.com) allowed users to specify certain websites (such as news on Yahoo, or books in the Gutenberg Project). The Simtalk software parsed the website, and placed it in a form compatible with text-to speech software. An animated head appeared on the computer monitor, along with a new window with control buttons. When the user clicked on the “read” button, the text-to-speech software read portions of the website preselected by Simtalk, while the animated head moved its mouth in synchronization with the words (called “lip-syncing” the words). This process worked by executing an independent software program (i.e., the Simtalk software) which parsed sentences and text strings from web pages and loaded them into an array of a table. When the user clicked on the window of the Simtalk software reader, the sentences in the table were sequentially read one-by-one out of the array, loaded into a text-to-speech function, and spoken.
In U.S. application Ser. No. 09/974,132 filed Oct. 9, 2001, entitled “METHOD OF PROCESSING INFORMATION EMBEDDED IN A DISPLAYED OBJECT,” incorporated herein by reference, text from one web page could be copied from one window (using drag-and-drop or copy-and-paste operations) to another window, where it would be put in the proper form to be read by text-to-speech software.
Many people have difficulty reading any specified text document, even if they are not blind. People have difficulty reading a document that is not written in their native or ethnic language. (In the United States, this literacy problem is attacked by the special educational programs and efforts referred to as “ESL” programs or “English as a Second Language.”) People have difficulty reading a document that is written with technical terms that they are not familiar with. People have difficulty reading a document that is written with more difficult words or sentence constructions than they are competent to decipher. (For example, in the United States, almost a quarter of the adult population reads at or below the fourth grade level and has difficulty reading and understanding the directions on the back of a medicine bottle.) Other people have difficulty reading any text because of dyslexia, mental retardation, or various developmental or cognitive disabilities. Other people have difficulty reading because of cultural or educational disabilities. Some of those who have difficulty reading may be sighted but have motor control disabilities which make drag-and-drop, point-and-click or copy-and-paste operations difficult.
Some electronic texts (such as some web sites) provide alternate texts in a few different languages. Some web sites provide automated machine translation of any text or web page that is submitted to them, by displaying text in the requested language. There are a variety of text-to-speech software packages that a user can install and submit text to, whereby the text is converted to the sound of a synthesized voice speaking the words. These applications generally require that the user is competent with reading and manipulating high school level text in at least one language. Text-to-speech browsers are also an expense for those in the lower socio-economic levels, frequently costing end users over $100. Use of such specialized browsers is also likely to stigmatize the users who may otherwise effectively hide their reading difficulties.
Some electronic texts embed audio clips, such as songs, interviews, commentary, or audio descriptions of graphics. However, production time and storage capacity requirements limit their use.