The present invention relates generally to data processing systems and, more particularly, to a method and system for unambiguously inputting multi-byte characters into a computer from a Braille input device.
Nonsighted or visually-impaired people have had difficulty in being integrated into the workforce due in part to the difficulty of working with computers to perform such tasks as word processing. In order to integrate visually-impaired people into the workforce, conventional systems have been developed that receive Braille input, store it into a computer, and output it to the user.
One such conventional system 100 for inputting Braille into a computer is depicted in FIG. 1A. The Braille system 100 comprises a computer 102 with a video display 104 and with a Braille I/O device 106. The Braille I/O device 106 is responsible for inputting Braille to the computer 102 via input keys 108-119 and for outputting Braille to the user via the output array 120. As shown in FIG. 1B, each unit of Braille 130 is expressed as a Braille cell having six predefined locations 132-142. Information is conveyed using a Braille cell through the presence or absence of an elevation at the predefined locations 132-142. For example, when Braille is conveyed on a paper medium, a punch behind the paper causes the paper to be elevated at one or more of the predefined locations 132-142. It is the elevations and the absence of elevations at the predefined locations 132-142 that convey meaning to the reader. The depression of the input keys 108-118 causes the computer 102 to receive a signal that a corresponding location 132-142 should be construed to be in an elevated position. Input keys 112, 110, 108, 114, 116, and 118 correspond to predefined locations 132, 134, 136, 138, 140, and 142, respectively. Input key 119 is a space bar which is used to indicate that none of the predefined locations have an elevation. Therefore, by using keys 108-118, a visually-impaired user can input information into the computer 102.
The output array 120 contains 20 output units (e.g., 122), where each output unit can output one Braille cell. As shown in FIG. 1C, each output unit (e.g., 122) contains six apertures 152-162, which correspond to the predefined locations 132-142 of a Braille cell 130, through which the system can provide a protrusion that is perceptible to the human touch. FIG. 1D depicts a left side elevational view of output unit 122 and further shows apertures 124 and 128 with a protrusion and aperture 126 without a protrusion. In this manner, the Braille system 100 can output up to twenty individual units of Braille (Braille cells) via the output array 120.
In order to read information from the video display 104, the user uses the arrow keys 122-128 in conjunction with the output array 120. The arrow keys 122-128 manipulate a reference cursor on the video display 104. The reference cursor highlights information on the video display 104 that is then output to the output array 120. For example, if the reference cursor were at the top of a document, the user could depress arrow key 126 to move down a line so that the user could read a line of information by feeling the output array 120. Similarly, the depression of arrow key 124 moves the reference cursor to the right, the depression of arrow key 128 moves the reference cursor to the left, and the depression of arrow key 122 moves the reference cursor up. By using the Braille system 100, a visually-impaired user is able to store Braille information onto the computer 102 and read Braille information from the computer.
When the Braille system 100 is used with the English language, the user can exactly indicate an English language expression because each Braille cell corresponds to exactly one letter of the English language. Therefore, the user can input one letter at a time and can read the output one letter at a time. However, such Braille systems are significantly less helpful when used with multi-byte languages. A xe2x80x9cmulti-byte languagexe2x80x9d is a language in which more than one byte is needed to uniquely identify each character of the language. In other words, there are more than 28 (or 256) characters in the language. The characters of a multi-byte language are referred to as multi-byte characters. Multi-byte languages, such as Kanji-based languages like Chinese, Japanese, and Korean, have approximately 40,000 characters.
In Kanji-based languages, the elements of grammar are known as xe2x80x9cKanji characters.xe2x80x9d The phrase xe2x80x9celements of grammarxe2x80x9d refers to units of a given natural language that are capable of comprising parts of speech. For example, the elements of grammar in the English language are words. As such, each Kanji character is a higher-order linguistic symbol that is analogous to a word in the English language. That is, natural languages tend to have three levels of linguistic elements. The lowest of these levels depends on the specific alphabet used and is associated with the sounds of the spoken language. For example, the first and lowest level of linguistic elements in the English language comprises letters. The third level of linguistic elements is the highest level and contains those linguistic elements conveying full creative expression. In the English language, the third level comprises sentences. It is the second level of linguistic elements to which the phrase xe2x80x9celements of grammarxe2x80x9d refers. This second level is an intermediate level of linguistic elements and, in the English language, the second level comprises words. In Chinese, the second level comprises Kanji characters.
Because there are approximately 40,000 Kanji characters in Kanji-based languages and only 26 (or 64) characters can be uniquely identified by one Braille cell, well-known systems have been devised to map individual Braille cells onto the phonetics of the multi-byte language. The phonetics, usually three, are then combined to identify an intended character, although the identification is inexact. The intended character is inexactly identified because many different characters sound alike, but have different meanings. For example, the following Chinese characters all sound like xe2x80x9cwongxe2x80x9d and thus are identified using the same Braille input, but each character has a different meaning: 
Because many characters sound alike in multi-byte languages, when using Braille to input and output multi-byte characters, there is an inherent problem of ambiguity.
FIG. 2 depicts a well-known phonetic mapping scheme for mapping Braille onto the phonetics of the Chinese language spoken in the Cantonese dialect. This phonetic mapping scheme groups all phonetics into three categories: consonants, vowels, and tones. A number of Braille cells are defined to indicate specific consonants, some of which are depicted in Table 202. Table 202 indicates a specific Braille representation, such as xe2x80x9cxe2x80x9d, that corresponds to a particular consonant, such as xe2x80x9cF as in Fay,xe2x80x9d and indicates the particular representation stored in the computer (e.g., xe2x80x9cFxe2x80x9d). In this example, when a user inputs xe2x80x9cxe2x80x9d via the Braille I/O device, they intend the consonant xe2x80x9cF as in Fay.xe2x80x9d The Braille I/O device sends the input to the computer where it is stored as an F character to indicate the particular Braille input and the phonetic represented by it. Using this system, some sounds have representations in the computer that do not correspond with the sound. For instance, although the sound for Braille input xe2x80x9cxe2x80x9d is xe2x80x9cG as in Gay,xe2x80x9d the representation in the computer is xe2x80x9cK.xe2x80x9d
Table 204 contains some sample phonetic mappings of vowels, where the Braille input corresponding to the specific sound and its representation within the computer are depicted. For example, the Braille input xe2x80x9cxe2x80x9d corresponds to the vowel xe2x80x9ciy as in sight,xe2x80x9d and is represented in the computer as xe2x80x9c%.xe2x80x9d Likewise, Table 206 depicts the phonetic mapping of various tones. One of these tones is the default tone which is specified by the absence of a Braille cell. Another of the tones is the rising tone, which is similar to the tone used when the speaker wishes to indicate a question. Using this phonetic mapping scheme for mapping Braille onto Cantonese phonetics, a user specifies a specific Kanji character by using usually three Braille cells: one for the consonant, one for the vowel, and one for the tone. In some situations, the user may omit the Braille cell for the tone to indicate that the default tone is desired.
When the Braille system 100 is used with the phonetic mapping scheme described above, the user inputs the Braille into the computer and the computer stores the phonetic representation (e.g., w;xe2x80x2 which is the computer representation of the phonetics for characters that sound like xe2x80x9cwongxe2x80x9d) and not the actual multi-byte character. Storing the data by its phonetic representation prevents the data from being used by a sighted user that does not understand these cryptic symbols and, therefore, does little to integrate the visually impaired into the workforce. Another problem with this system is that since the phonetics are mapped onto the characters of the multi-byte language, the Braille does not exactly map to a specific character, because many characters have the same sound but mean completely different things. As such, there is a significant amount of ambiguity which poses a problem. Such ambiguity problems must be overcome to facilitate the use of computers by the visually-impaired. Therefore, it is desirable to improve Braille input systems for multi-byte languages to resolve ambiguities.
An improved recognition system for translating Braille into multi-byte languages is provided that resolves ambiguities in the translation. By resolving ambiguities in the translation, the improved recognition system helps integrate visually-impaired users into the workforce. Such integration is achieved by providing visually-impaired users with both the means to input Braille for translation into a multi-byte language and the means to disambiguate the translation so that it reflects what the user intended. In this manner, the translation accurately reflects the intentions of the user. Furthermore, the translation is actually stored in the computer in the multi-byte language so that both sighted and nonsighted users alike can utilize the translation.
In accordance with a first aspect of the present invention, a method is provided for translating Braille input into characters of a multi-byte language in a computer system having the Braille input and having a database of entries containing mappings of Braille to phrases containing at least one character of the multi-byte language. In accordance with the first aspect, the method attempts to match the Braille input to at least one of the entries in the database to translate the Braille input into the multi-byte language. When the Braille input does not match at least one of the entries, the method reduces the Braille input by an amount sufficient to represent a character and attempts to match the reduced Braille input to at least one of the entries in the database. When the reduced Braille input does not match at least one of the entries in the database, the method repeatedly reduces the Braille input and attempts a match until the reduced Braille input matches at least one of the entries in the database to translate the reduced Braille input into the multi-byte language.
In accordance with a second aspect of the present invention, a method is provided for translating input containing portions into characters of a multi-byte language in a computer system. A portion of the input corresponds to a plurality of characters where only a single intended character is intended by a user to be identified by the portion. In accordance with the second aspect, the method receives the input for translation into the multi-byte language where the input contains a user-specified indication of a portion that corresponds to a plurality of characters. The method also utilizes the user-specified indication to unambiguously translate the portion into the single intended character.
In accordance with a third aspect of the present invention, a method is provided for translating input in a first language into a second language in a computer system having a database with entries containing mappings of portions of the input onto phrases of the second language. In accordance with the third aspect, the method receives the input for translation, translates the input into the second language by matching the portions of the input against the database entries to identify matching phrases, and outputs the matching phrases such that a user can discern a distinctness of each matching phrase to facilitate detection of translation errors.
In accordance with a fourth aspect of the present invention, a method is provided for translating phonic data representing spoken sounds of a language into text of the language in a computer system having a database with entries containing mappings of phonic data onto phrases of the text. The method receives portions of the phonic data and translates the phonic data to text by mapping the received portions of the phonic data to the phrases in the database entries.
In accordance with a fifth aspect of the present invention, a method for translating Braille input into characters of a multi-byte language in a computer system is provided. The method receives the Braille input, translates the Braille input into text of the multi-byte language, and stores the text into the computer such that the text is represented as individual characters of the multi-byte language so that the text is understandable to a user that understands the characters of the multi-byte language, but does not understand Braille.
In accordance with a sixth aspect of the present invention, a method is provided for translating input having elements of grammar from a first form into a second form in a computer system. The method receives input in the first form, translates the input into the second form to create translated elements of grammar, and outputs usages of the translated elements of grammar in the second form so that a user can identify translation errors.
In accordance with a seventh aspect of the present invention, a method is provided for phonetically inputting data into a computer system. The method receives input comprising groups of phonetics representing sounds made when a language is spoken where a group of phonetics corresponds to at least one element of grammar of the language. For each of the groups in the received input, the method identifies at least one element of grammar that corresponds to the group of phonetics and outputs usages of the identified element of grammar so that a user can determine if the identified element of grammar is an intended element of grammar.