Even if the enthusiasm of the public for computer-based multimedia information has been considered by many analysts as a threat to the conventional forms of hard-copied publishing, reading paper cannot be compared with reading an electronic media. Although many electronic document systems, e.g., Web browsers or e-books readers, attempt to replace paper and printed publications, experience shows that reading paper remains preferable, whether the readers are familiar with computers or not. For most people, paper has a number of advantages: paper publications are easy to read, to mark, and to manipulate. Paper publications are familiar, portable and easily shared and distributed. Herein are given several publications comparing electronic documents versus printed documents, for example, J. Jacobson, B. Comiskey, C. Turner, J. Albert, and P. Tsao of the MIT Media Laboratory, “The Last Book”, IBM Systems Journal, Vol 36, No. 3, 1997; and Michael D. Levi, “Literature at the Human-Computer Seam”, presented at the Modern Language Association 2000 Annual Convention, on http://www.bls.gov/ore/htm_papers/st000100.htm.
With the development of electronic documents, a new mechanism for linking data resources has appeared: hyperlinks. A hyperlink is defined as a “[t]ext or graphical image associated with a URL such that when the user clicks on it, the browser displays the page at that location”. Hyperlinks simplify the navigation among many and large electronic documents by providing a one-click selection mechanism. In view of the hyperlink's efficiency, modified versions have been developed for printed documents, e.g., U.S. Pat. No. 6,771,283 entitled METHOD AND SYSTEM FOR ACCESSING INTERACTIVE MULTIMEDIA INFORMATION OR SERVICES BY TOUCHING HIGHLIGHTED ITEMS ON PHYSICAL DOCUMENTS to Carro dated 3 Aug. 2004.
Given the ease to access the Internet, anywhere and anytime, and the amount of information available on the Internet, the interface that gives direct and/or simple access to specific information becomes a key factor to provide valuable information. The traditional interface of personal computers based on the combination of the screen, the keyboard, and a pointer such as a mouse, allows relatively fast and efficient access to the desired information on the Internet. It becomes more problematic, however, for mobile computers, wearable computers, handheld devices, and the likes, that have reduced screens, miniaturized keyboards and basic selection mechanisms, or none at all. Such selection mechanisms include miniaturized mouses or sensor pads with a pointing stylus combined with light beams to move a cursor on the smaller screens. The tedious process of pointing on a small screen or typing data on a miniaturized keyboard limits the ability to enter information through small display screens and portable keyboards, and this limitation can be crippling for persons who are physically challenged, or have a limited range of movement because of age or arthritis.
The development of speech recognition technology has opened up a new era of man-machine interaction. A speech user interface and automatic speech recognition (ASR) provides a convenient and highly natural method of data exchange between a user and a computer, particularly mobile or handheld computer. For example, U.S. Pat. No. 6,101,472 entitled DATA PROCESSING SYSTEM AND METHOD FOR NAVIGATING A NETWORK USING A VOICE COMMAND to Giangarra et al. dated 8 Aug. 2000, discloses a voice command interface which allows a user to merely speak the name of a link to receive the desired corresponding web page from a communication network, such as the Internet. During operation, a client computer accesses a current web page from a server. When a new web page is accessed, a processing unit in the data processing system provides control signals to a speech recognition unit to clear a vocabulary list currently stored within the speech recognition unit. Subsequently, the processing unit commences to parse the HTML source code of the accessed web page. The processing unit then determines whether the accessed web page has any links therein. If hyperlinks are embedded within the web page, the processing unit detects those hyperlinks during a parsing operation and enables the speech recognition unit to store the text displayed to an external user which corresponds to the link in a special vocabulary list of the speech recognition unit. A user is then able to provide a voice command to access the link by speaking the text stored within the special vocabulary list. Upon speaking that text, the processing unit accesses a web page corresponding to the link identified by the text.
In FIG. 1, the prior art shows a user 100 pronouncing a link name written in a document 105 displayed on the screen of a handheld device 110. Alternatively, the user 100 can spell the letters. For sake of illustration, the link name is “AGAMEMNON”. The pronunciation of the word “Agamemnon” or, alternatively, its spelling is analyzed by a speech recognition engine of the handheld device 110. When the link name is recovered, the corresponding data are accessed through the network 115 to which the handheld device 110 is connected, and displayed on the handheld device 110. Alternatively, the accessed data can be “read” by the handheld device 110 using a text-to-speech synthesis software. The connection between the handheld device 110 and the network 115 can be of any type, including but not limited to a wide area network, local area network, wire connection, wireless connection, etc.
Speech recognition is the process by which an acoustic signal received by a microphone is converted to a set of text words. Speech recognition involves software and hardware that cooperate to audibly detect human speech and translate the detected speech into a string of words. Speech recognition works by breaking down sounds detected by hardware into smaller, non-divisible sounds, called phonemes i.e., distinct units of sound. The speech recognition software attempts to match the detected phonemes with known words from a stored vocabulary. In most cases, successful conversion of acoustic signals must be based upon an existing vocabulary of known words. Once recognized, words may then be used in a variety of computer software applications for purposes such as document preparation, data entry, or command and control.
Even for limited vocabularies, traditional speech recognizers use complex algorithms which in turn need large storage systems and/or dedicated Digital Signal Processors (DSP) with high performance computers. Because of memory limitations and system processing requirements, in conjunction with power consumption limitations, embedded or local speech recognition engines provide recognition to only a fraction of the audio inputs recognizable by a host, network-based speech recognition engine. ASR implemented on wearable, miniaturized, handheld computing devices generally cannot recognize speaker independent, continuous speech in real-time. It is not always feasible to predict every word which can be possibly spoken by a user of a speech-enabled system and furthermore, the speech recognizer must deal with environment noise e.g., environment wherein several persons are speaking simultaneously. Finally, there is a great variability in how different speakers pronounce words, as well as a variability in how an individual speaker pronounces words from one time to another.
To reduce such variability another approach requires that each word to be recognized be spelled aloud. Despite the possibility of specifying words by spelling, there are drawbacks such as the number of letters to spell. Spelling long words is tedious for the user and prone to errors. The presence of “confusable” letters on spelled words, such as clearly elucidating the differences between “p” and “t” and “d”, etc., also introduces errors. As a consequence, there is a need for an efficient method and system that facilitate access to electronic data from a marked link in electronic and/or printed documents. The widespread use of the Internet and mobile communications offer new opportunities to combine electronic and printed media, in other words to create “media-adaptive multimedia” products. The philosophy behind the concept of media-adaptive multimedia is that information must be transferred to users in a form adapted to their needs. In fact, traditionally printed documents, digitally printed documents and multimedia products must be complementary. The different components must be combined depending on the user's needs. To facilitate this evolution, the electronic content should be accessible directly from the printed medium. Thus, there is an additional need in the industry to provide a method and system to improve access to information resources available on networks such as the Internet through reduced audio commands. The method and system should also distinguish the information resource available on networks such as the Internet from electronic and/or printed documents, and improve access to information linked to electronic and/or printed document using spelling discrimination audio commands.