The present invention relates to a pattern recognition apparatus for recognizing input patterns and displaying recognized results. More particularly, the invention relates to a pattern recognition apparatus to which predetermined character strings such as addresses and fixed phrases, are handwritten for inputting.
A majority of applications for processing of slips, invoices and other forms by so-called pen PC""s (pen-input computers) primarily involve inputting addresses and fixed phrases to the apparatus. Three representative methods have been proposed to have predetermined character strings, such as addresses and fixed phrases, entered: (1) choose from among candidates in a menu format; (2) in a menu-and-character recognition combination format, input a ZIP code to generate a menu-display of candidate addresses to choose from; (3) write by hand characters to be recognized so that their candidates are optimized by use of a word dictionary.
The method (1) above is disclosed illustratively in xe2x80x9cRecognition of Handwritten Addresses in Unframed Setup Allowing for Character Position Displacementsxe2x80x9d (Periodical D-2 of the Institute of Electronics, Information and Communication Engineers of Japan, January, 1994). The method generally involves, given hierarchical data such as addresses, selecting candidate data successively from the top through the bottom layers of hierarchy. For example, xe2x80x9c(Ibaraki-ken (a prefecture in Japan))xe2x80x9d may be followed by xe2x80x9c(Hitachi-shi (city))xe2x80x9d which in turn can be followed by xe2x80x9c(Oomika-cho (town))xe2x80x9d. One disadvantage of this method is that if a user is not certain whether Hitachi-shi is located in, say, Tochigi-ken or Ibaraki-ken (i.e., the prefectural or topmost category), the user has difficulty selecting illustratively Hitachi-shi.
With the method (2) above, the user need only input a ZIP code, and the system will give a menu-display of code-prompted addresses to choose from. The procedure is relatively simple so long as the user remembers all necessary ZIP codes; however, they can be difficult to memorize except probably for the user""s own ZIP code.
The method (3) above allows handwritten characters to be recognized and their candidates to be optimized through the use of a word dictionary. How this method works is outlined below with reference to some of the accompanying drawings. FIG. 3 is a schematic block diagram of a conventional character recognition apparatus. In FIG. 3, a handwritten pattern input through a tablet a1 is pattern-matched with a recognition dictionary a2 in a character recognition process a3. Candidate characters thus obtained are matched in words with a word dictionary a6 in a word correlation process a7. Following the word matching, the applicable words are displayed on an LCD a8.
FIG. 4 is a schematic flow diagram showing how a conventional character recognition apparatus is used to input an address. For example, to input xe2x80x9c(Ibaraki-ken (prefecture))xe2x80x9d, xe2x80x9c(Hitachi-shi (city))xe2x80x9d, xe2x80x9c(Oomika-cho (town))xe2x80x9d, the user writes by hand all these characters into a predetermined address input area b1. The handwritten characters are then recognized in the process a3. Candidate characters obtained from the recognition process are matched in words with the word dictionary a6, starting from the highest layer category (i.e., prefectural level). The candidate characters are thus optimized and the results are output as candidate characters.
Conventionally, hierarchical data such as addresses are accessed from the highest hierarchical layer down. This is because the higher the layer is in hierarchy, the smaller will be the amount of data stored so that once the highest layer candidate is determined, the lower candidates are readily inferred therefrom. But suppose that the conventional system receives a keyword xe2x80x9c(Oomika-cho (town))xe2x80x9d for a search through the word dictionary. In that case, the system has no choice but to search through an the entire word dictionary which may be as large as 1.5 MB because the layer of the input keyword is unknown. This scheme is thus impractical in applications such as online character recognition where high degrees of responsiveness are required.
A typical word dictionary that stores addresses in Japan may be constituted as follows:
Prefectural names:
about 50 namesxc3x97about 3 characters per namexc3x972 bytes per character=about 300 B in capacity
Cities and towns:
about 4,000 namesxc3x97about 3 characters per namexc3x972 bytes per character=about 2.5 kB in capacity
Subordinate municipalities:
about 160,000 namesxc3x97about 4 characters per namexc3x972 bytes per character=about 1.3 MB in capacity
The total volume of data in such a representative dictionary is about 1.5 MB.
One disadvantage of the above conventional method is the chores that the user must put up with in writing by hand the entire address desired, which can be as long as, say, xe2x80x9c(Ibaraki-ken), (Hitachi-shi), (Oomika-cho)xe2x80x9d.
One problem common to all three methods (1) through (3) outlined above is that in character recognition applications, the user is subject to the tedious task of writing by hand all character strings such as addresses and fixed phrases. Another common problem is that a search through the word dictionary for a word in any layer other than the topmost layer of hierarchy can take a very long time. A further problem is that in a menu-driven environment of a hierarchical data structure illustratively made up of addresses, lower-layer items cannot be selected unless their upward items are known.
It is therefore an object of the present invention to provide a pattern recognition apparatus for accepting only key characters (e.g, xe2x80x9c(Oomika)xe2x80x9d or xe2x80x9c(xcx9cMika-choxe2x80x9d) written by hand in order to infer the remaining character string (e.g., xe2x80x9c(Iaragi-ken), (Hitachi-shixe2x80x9d), whereby the entire character string recognized is output (e.g., xe2x80x9c(Ibaraki-ken), (Hitachi-shi), (Oomika-cho)xe2x80x9d).
In carrying out the invention and according to one aspect thereof, there is provided a character recognition apparatus having recognition means for recognizing input character strings and display means for displaying recognized results, the character recognition apparatus comprising: a word dictionary storing word identification information and hierarchy information for layering a plurality of words into a hierarchy and for recognizing each of the words within the hierarchy; a character transition probability table storing at least probabilities of transitions from any one character to another, and those pieces of the word identification information which correspond to combinations of characters resulting from the transitions; optimization means for using the character transition probability table in optimizing candidate character strings obtained by the recognition means; and retrieval means for searching through the word dictionary for words defined by those pieces of the word identification information which correspond to the optimized candidate character string, thereby retrieving the searched words which are identified by the applicable pieces of the hierarchy information and which have yet to be input.
When characters of a low hierarchical level such as xe2x80x9cOomika-choxe2x80x9d alone are input, the inventive character recognition apparatus outlined above first extracts xe2x80x9cOomika-choxe2x80x9d as the candidate character string optimized by the optimization means. The word dictionary is then searched for higher-level words on the basis of the word identification information corresponding to the optimized character string. The search yields yet-to-be input words xe2x80x9cIbaraki-ken, Hitachi-shi,xe2x80x9d higher in hierarchy than the input xe2x80x9cOomika-cho.xe2x80x9d The recognized result is xe2x80x9cIbaraki-ken, Hitachi-shi, Oomika-cho,xe2x80x9d the entire character string made up of the entered and unentered words.
According to another aspect of the invention, there is provided a character recognition apparatus having recognition means for recognizing input character strings and display means for displaying recognized results, the character recognition apparatus comprising: a dictionary having each of a plurality of character strings stored beforehand at a specific address; a character transition probability table storing at least probabilities of transitions from any one character to another, the probabilities being stored in correspondence with the addresses of those of the character strings which include combined characters involved in the transitions; and optimization means for using the character transition probability table in optimizing candidate character strings obtained by the recognition means; wherein the dictionary is accessed for the addresses of the character strings corresponding to the optimized candidate character string, and wherein the character strings at the addresses in the dictionary are displayed as recognized results.