The present invention relates generally to hand-held scanning dictionaries, and in particular, to a scanning dictionary that is optimized for teaching languages.
While dictionaries provide multiple meanings to words or word stems, a dictionary user requires the meaning in context and is to sort and shift for himself from the plurality of meanings suggested to him. To students of a foreign language, this is not an easy task. Often, the meaning depends on the part of speech a word plays, but to analyze a sentence for its parts of speech, one must understand it sufficiently. For example, when confronted with, xe2x80x9cName two reasons for the strength of the present economy,xe2x80x9d many students of English as a Foreign Language will gaze at what, in their view, is a sentence with no verb. Since student rarely look up words they believe they know, they are unlikely to look up xe2x80x9cnamexe2x80x9d for a possible unfamiliar meaning.
An additional difficulty with using a dictionary is that often, a dictionary provides only the word stem, which may be, for example, a verb, and not the word as it appears in the sentence, which may be, for example, an adjective. For example, the meaning in context, for the phrase xe2x80x9caugmented costsxe2x80x9d may not be found in a dictionary.
An old-fashioned language teaching method, known in Aramaic as xe2x80x9cShnaiim Mikra Ve""ahad Targum,xe2x80x9d or, xe2x80x9cread, translate, readxe2x80x9d is designed to provide the meaning in context, averting the problems associated with independent study with a dictionary. However, it requires a teacher, close at hand.
Another problem that students of a foreign language encounter is pronunciation. When a person who was raised in a specific system of sounds and vowels moves into a different system, his difficulty is twofold: not only can he not pronounce the new sounds and vowels, but often, he does not hear their distinguishing features. A person whose mother tongue has a single xe2x80x9cexe2x80x9d sound, may not hear the difference between xe2x80x9citxe2x80x9d and xe2x80x9ceatxe2x80x9d. Yet, being able to hear this difference is a prerequisite to producing it.
Furthermore, written languages rarely provide unequivocal information with regard to pronunciation. In English, for example, there is xe2x80x9chome,xe2x80x9d and xe2x80x9cdome,xe2x80x9d but xe2x80x9ccome,xe2x80x9d and xe2x80x9csome.xe2x80x9d There is xe2x80x9cweight,xe2x80x9d and there is xe2x80x9cheightxe2x80x9d. The word xe2x80x9cmisledxe2x80x9d is not pronounced like the word xe2x80x9cfiddled,xe2x80x9d and the word xe2x80x9cearxe2x80x9d is not pronounced like the word xe2x80x9cbear.xe2x80x9d There are silent letters like xe2x80x9cgxe2x80x9d in xe2x80x9cparadigmxe2x80x9d or xe2x80x9ccxe2x80x9d in scintillation. For students of a foreign language, pronouncing what they read may involve considerable guesswork.
Optical scanners are known. They convert objects such as pictures, barcodes, or portions of text to machine-readable data signals. Typically, the data signals are read by a user""s computer to reproduce an image of the scanned object on a display device, such as a CRT, a display screen or a printer.
A hand-held optical scanner is manipulated by hand across the object that is being scanned. The hand-held scanner may be connected directly to the user""s computer by a data cable, and may transfer image data to the computer as he data are collected. Alternatively, the hand-scanner may be a stand-alone unit and may include a data storage component for storing the image data. The data may be downloaded to a separate computer after the scanning operation is complete.
A hand-held optical scanner generally includes an illumination system, for illuminating the region to be scanned, an optical system, for collecting and focusing light reflected by the illuminated, scanned region, a photosensitive detector, for detecting the light collected and focused thereon by the optical system, an analog amplifier, for amplifying the signals produced by the photosensitive detector, and an analog-to-digital converter, for converting the amplified signals to digitized machine-readable data signals. The illumination system may be, for example, a fluorescent or incandescent lamp or an array of light emitting diodes (LEDs). The optical system may include a lens or a lens-and-mirror assembly.
The photosensitive detector is generally a Charge-Coupled Device (CCD). A CCD includes an array of photosensitive cells, or pixels, each pixel collecting an electrical charge responsive to the light that falls upon it. Thus, a CCD may be used to detect light and dark spots of a scanned object. The charge from each pixel is converted to an analog voltage by an analog amplifier, and the analog voltage is digitized by an Analog-to-Digital Converter (ADC). The digitized signals are the machine-readable data signals, which can be stored or processed by the user on a computer or a similar device.
Sometimes, a Contact Image Sensor (CIS) is used in place of the CCD. In a CIS scanner, the array of photosensitive cells is arranged in close proximity to the object to be scanned, so as to catch the reflected light directly; an optical system is not necessary.
U.S. Pat. No. 5,996,895 to Heiman, et al, incorporated herein by reference, describes a scanning system with adjustable light output and/or scanning angle.
U.S. Pat. No. 6,033,086 to Bohn, incorporated herein by reference, describes a compact illumination system for a hand-held scanner.
U.S. Pat. No. 5,841,121 to Koenck, incorporated herein by reference, describes a hand-held optical scanners, having automatic focus control, for operation over a range of distances.
U.S. Pat. No. 5,019,699 to Koenck, incorporated herein by reference, describes a hand-held optical scanner, which includes a lens system having circular symmetry. The lens system focuses the full width of the object onto an array of photosensitive cells, with a single flash of a ring-type xenon flash tube, which surrounds the lens system and is symmetrically arranged relative to the optical axis. In this way, the object can be scanned at any angle relative to the array of photosensitive cells, and the scanned image, stored in digital form, can be electronically rotated to a desired orientation, before it is decoded.
U.S. Pat. No. 5,834,749 to Durbin, incorporated herein by reference, describes a hand-held scanner for reading images at oblique angles, in order for the scanning unit not interfere with the user""s view of the scanned image. The distortion to an obliquely scanned image, arising from the oblique scanning, can be corrected by any of several correction techniques, as follows:
1. a ratio of vertical to horizontal line densities of the array of photosensitive cells can be chosen to compensate for the vertical foreshortening of the scanned image;
2. the array of photosensitive cells can be oriented at an oblique angle with respect to the optical axis, to compensates for the distortion inherent in the oblique scanning;
3. a lens system can be configured to provide varying degrees of magnification along its surface; and
4. as taught by U.S. Pat. No. 5,019,699, to Koenck, described hereinabove, processing techniques can electronically re-orient the scanned image after storing it in the scanner""s memory.
Hand-held, stand-alone, optical scanners that produce audio output are known. U.S. Pat. No. 5,945,656 to Lemelson, et al, incorporated herein by reference, describes a pen-like stand-alone scanner for transuding coded data into pre-coded pieces of speech or music. Generally, a scanning guide is attached to a book, arranged for guiding the pen-like scanner vertically along an edge of the book, which contains coded information. Aided by the guide, children may scan the coded data and produce the sounds associated with them.
U.S. Pat. No. 5,767,494 to Matsueda, et al., incorporated herein by reference, describes a system for reproducing multimedia information, recorded with an optically readable code. The code is a dot-code format described in U.S. Ser. No. 08/407,018 (PCT Publication No. WO 94/08314), and includes two-dimensional patterns that convey multimedia information, for example, audio information like speech and music, image information obtained from a camera or a video device, and digital code data obtained from the user""s computer, for example, from a word processor. The system uses paper as a basic information-storage medium for the optically readable code, and includes a preferably pen-like scanner, arranged to read the code. The system may reproduce the original multimedia information by an appropriate hardware such as a display screen, a printer, or a speaker, and includes a speech synthesizer. In some embodiments, the pen-like scanner is a stand-alone unit, and may include earphones.
The IRISPen of Image Recognition Integrated Systems Inc., of Rue Du Bosquest 10, 1348 Louvain-la-Neuve, Belgium, is a pen-like scanner that allows the user to scan text, bar codes and handwritten numbers into any Windows or Mac application. The IRISPen is hooked up to any desktop or portable computer without any additional interface boards. The IRISPen is not stand-alone apparatus.
The IRISPen Executive(trademark) integrates text-to-speech technology from Lernout and Hauspie, in six languages (English, French, German, Dutch, Spanish and Italian). It provides natural audio feedback of all recognized words and numbers as it scans the information. The purpose of the text-to-speech technology is to reduce the need of the user to keep his eyes on the computer screen to verify recognition. The IRISPen Translator is further arranged to automatically translate text between English and German. Output may be in the form of written text in the translated language, displayed on a computer screen or printed. Alternatively, the output may be an audio output, in the translated language.
The IRISPen Executive(trademark), the IRISPen Translator, and other IRISPen products are not stand-alone apparatus. Rather, they are arranged to operate with a computer, such as a desktop PC or a notebook computer, into which the IRISPen software has been installed. The output language is the language that has been installed to the computer, and cannot be changed during a scanning operationxe2x80x94an audio output may be provided only in the original language or only in the translated language. Furthermore, the automatic translation language is not intrinsic to the IRISPen Translator. Rather, it has to be installed to the computer that supports the IRISPen. Neither are the speaker or earphones for audio output intrinsic to the IRISPen. Instead, the computer speakers are used for the audio output. Therefore, the IRISPen is not a single product but a package of several products, which are sold together and are arranged to operate together.
Text-to-speech (TTS) syntheses, additional to the technology of Lemout and Hauspie of the IRISPen Executive(trademark), are known. Bell Labs and Edinburgh University have developed a text-to-speech synthesis based on a Spoken Text Markup Language (STML) standard. STML later became SABLE. Sun Microsystems, Inc., in partnership with other speech-technology companies, has worked to define the specifications for a Java Speech API and a Java Speech Markup Language (JSML), incorporating many of the aspects of SABLE. JSML has been accepted by W3C (the organization responsible for WWW standards) as a standard. Bell Labs Lucent Technologies now offer a text-to-speech synthesis, which provides choices between voices of a man, a woman or a child and a speech rate that is fast, normal or slow. The University of Edinbourgh has developed a generally multi-lingual system known as The Festival Speech Synthesis System, available in English (British and American), Spanish and Welsh. Additionally, Digital offers the DECtalk(trademark) Speech Synthesizer which converts ASCII text to natural-sounding speech output. IBM offers the V5.1 speech synthesizer. Apple offers xe2x80x9cEnglish Text-to-Speechxe2x80x9d software with recent versions of the MacOS. The University of York has produced YorkTalk, Oxford University offers an all-prosodic speech synthesizer entitled, IPOX. Telcordia Technologies (formerly Bellcore) have developed the ORATOR and an improved version, the ORATOR II. Entropic Research Laboratory, Inc. offers TrueTalk 1.0, a software-only text-to-speech system based on a major research effort at ATandT Bell Laboratories. ATandT has developed Next-Generation TTS to convert machine-readable English text into audible speech. The Speech Technology Unit at BT has produced, and is continuing to develop, a sophisticated text to speech system called Laureate. Eurovocs is still another commercially available, text-to-speech product. BORIS is a high-quality, diphone-based text-to-speech converter for Spanish, developed by Universidad Politecnica de Madrid. Lycos Search offers a text-to-speech synthesizer, as do SoftVoice, Inc., Eloquent Technology, Inc., and many other companies.
Lernout and Hauspie, which developed the technology of the IRISPen Executive(trademark), described hereinabove, offers a multi-lingual, text-to speech system in British English, Dutch, French, German, Italian, Japanese, Korean, Portuguese (Brazilian), Russian and Spanish.
HMM-Based Trainable Speech Synthesis has developed a speech synthesis which uses a set of decision-tree state-clustered Hidden Markov Models. The system automatically selects and segments a set of HMM-state sized sub-word units from a continuous-speech database of a single speaker for one hour for use in a concatenation synthesizer, to produce highly intelligible, fluent speech. It can be retrained on a new voice in less than 48 hours.
Automatic translation, additional to the technology of the IRISPen Translator, are known. For example, Language Teacher(copyright) of Ectaco, 1205 E. Pike, Seattle, Wash. 98122, is a pocket, electronic dictionary and translator with 2 million words and phrases, which generally operates as a stand-alone unit. Some models may be connected to user""s computers and interact with Windows 95 or 98. It is available for translation between English and any of the following languages: Albanian, Arabic, Bulgarian, Chinese, Czech, French, German, Greek, Hebrew, Hungarian, Italian, Latvian, Polish, Portuguese, Romanian, Russian, Serbo-Croatian, Spanish, Turkish, Vietnamese, and Yiddish.
The Language Teacher(copyright) includes words as well as phrases, idioms, irregular verbs, and linguistic games and grammar. If further includes a built-in, voice synthesizer which produces an audio output in multiple languages. Additionally, the Language Teacher(copyright) includes an organizer. A digital voice recorder stores up to 15 minutes of human speech. Its model xe2x80x9cPartner(copyright)xe2x80x9d is designed to translate texts, and send and receive e-mail and faxes.
There is a wide selection of automatic translation software, for example, Deluxe Universal Translator, of LanguageForce, Easy Translator 3, of Transparent Language, LandH Power Translator Pro, of LandH Speech Products, and Translation Manager 2.0, of IBM.
Software for correcting the user""s pronunciation is known. For example, xe2x80x9cTalk to Me(trademark), by Globalink, Inc., Fairfax, Va., is a software, arranged for a PC computer. The user may use the software to listen to a dialogue and to try to reproduce it. The software records the user""s voice and compares its signals with those which would be produced by a native speaker, displaying to the user the differences in signal forms. However, the dialogues are provided by the software; the user cannot use the software to practice on sentences of his choice, for example, in order to prepare for a speech that he is to give.
The present invention relates to stand-alone, hand-held, scanning apparatus, which provides a user with exposures to a spoken form and an interpretation of a portion of text, simultaneously, or in sequence.
In accordance with a preferred embodiment of the present invention, the apparatus provides a text-to-speech synthesis of a portion of text, for example, a sentence, followed by an audible, automatic translation to a second language, of the same portion of text. Alternatively, the automatic translation may be displayed in parallel with the text-to-speech synthesis.
In accordance with other embodiments, interpretation includes translation of difficult words and phrases, in context, upon request, or rephrasing of difficult words and phrases, in context, upon request. These may be audible or displayed.
Alternatively, or additionally, the stand-alone, hand-held apparatus may be used for teaching correct pronunciation of a portion of text. Preferably, teaching correct pronunciation includes the steps of providing a text-to-speech synthesis of the portion of text, recording the user""s pronunciation of the portion of text, and playing back the user""s pronunciation, for the user to hear any differences between his pronunciation and that of the text-to-speech synthesis.
Additionally, in accordance with the present invention, the stand-alone, hand-held apparatus may be used for synthesizing written notes of a piece of music. Preferably, the music is synthesized in the sound of a desired instrument, for example, a cello. The user, who may be preparing to play the piece of music, may maintain visual contact with the written notes, as they are being synthesized.