This specification relates to input methods.
A writing system uses symbols (e.g., characters or graphemes) to represent sounds of a language. A collection of symbols in a writing system can be referred to as a script. For example, a Latin writing system, including a collection of Roman characters in one or more Roman scripts, can be used to represent the English language. The Latin writing system can include blocked Roman characters (e.g., capitalized character “B”), typed Roman characters (e.g., plain character “b”), and cursive Roman characters (e.g., cursive character “b”). Each visual representation of the character “b” represents the same grapheme in the Latin writing system.
As another example, the Chinese language can be represented by more than one writing system. For example, the Chinese language can be represented by a first writing system, e.g., Pinyin (or Romanized Chinese). As another example, the Chinese language can be represented using a second writing system, e.g., Bopomofo or Zhuyin Fuhao (“Zhuyin”). As yet another example, the Chinese language can be represented using a third writing system, e.g., Hanzi. In particular, Pinyin and Zhuyin are phonetic systems for representing Hanzi characters.
Some input methods allow a user to input text in a first writing system and provide output candidates in a second writing system. For example, a Pinyin input method allows a user to input a Pinyin string and can generate output candidates in Hanzi. The Pinyin string can include one or more Pinyin syllables. A Pinyin syllable can be include a first sub-syllable (e.g., a portion of a syllable) followed by a second sub-syllable. Each Pinyin syllable corresponds to multiple Hanzi characters, and each sub-syllable includes one or more Roman characters. For example, a Pinyin syllable “zhang” can be segmented into a first sub -syllable “zh” and a second sub-syllable “ang”. Furthermore, both sub-syllables “zh” and “ang” can be combined with other sub-syllables to create other Pinyin syllables. For example, sub-syllables “zh” and “a” can be combined to create the Pinyin syllable “zha”, and sub-syllables “t” and “ang” can be combined to create the Pinyin syllable “tang”.
Generating output candidates may require identification of morphemes (e.g., syllables) in the input text, e.g., by segmenting the input text.