This invention relates to a character recognition and communication system which can recognize various kinds of characters with high precision and at low cost. An optical character reader (abbreviated as "OCR") is very useful for a data transmission system. M. D. Freedman describes an OCR for reading printed characters ("Optical Character Recognition", IEEE Spectrum, March, 1974). An OCR for reading hand-written characters generally uses a different character-recognizing process from the case of reading printed characters, but may be represented broadly by the broadly same block circuit diagram.
FIG. 1 is a schematic block circuit diagram of an ordinary character reader. A scanning unit 1 undertakes the transport of a document, the scanning and photo-electric conversion of a character appearing on a document, and normally the digitalization (mainly binarization) of a scanning signal. A recognition unit 2 comprises three blocks, namely, a preprocessing unit 3 for detecting a character pattern requiring recognition from a signal received from the scanning unit 1, eliminating noises, and normalizing the size of the pattern and its inclination; a feature extraction unit 4 for extracting effective features from the preprocessed character pattern for correct recognition thereof; and a discrimination unit 5 for distinguishing the category to which an input character belongs by reference to a dictionary memory which is supposedly supplied with the corresponding character.
The greatest problems now confronting the development of a character reader are its recognition performance and cost. The recognition performance can be expressed by reference to the font to which a character requiring recognition belongs and on the basis of recognition precision. The OCR developed to date, however, has the following drawbacks:
(1) Where a relatively small amount of data (for example, a relatively small number of documents) is to be treated by an OCR, then this apparatus must be of correspondingly low cost. Customary practice to meet this requirement is to slow down the treating speed and somewhat sacrifice the recognition performance. Consequently characters typed or hand-written on a document are subject to rigid restrictions in respect of the shape and the density, to render the OCR practically available, though it may have a low recognition performance.
(2) The OCR has a far lower capacity to recognize characters than the human ability, though a great deal of effort has been directed to its improvement. Various character-recognizing processes adopted by the OCR have merits and demerits with respect to the shape of characters requiring recognition.
For resolution of the above-mentioned problems, the following processes have hitherto been contemplated or put into practice.
(i) "Pattern Recognition Techniques", by J. R. Ullmann, Butterworth & Co., Ltd., 1973, pp. 1 to 21. PA0 (ii) U.K. Pat. No. 1,180,290, February, 1970 (H. Genchi). PA0 (iii) Y. Fujimoto et al, Recognition of Handprinted Characters by Nonlinear Elastic Matching (3rd International Joint Conference on Pattern Recognition, November, 1976). PA0 (iv) H. A. Glucksman, Multicategory Classification of Patterns Represented by Higher Order Vectors of Multilevel Measurements, IEEE Trans. on Computers (C-20, No. 12, December, 1971, pp. 1,593 to 1,598).
This reference describes various forms of a mask-matching process with a character regarded as a sort of type. The method of said reference is adapted to recognize characters little varying in shape, for example, printed characters.
The method of this reference is based on the detection of the edges of a character pattern. Namely, the method recognizes a character by extracting the topological features of a character such as the loops, concave, convex and parallel segments of character strokes and examining the sequence in which these topological features appear when a character pattern is viewed from the top to the bottom.
The method of this reference skeletonizes character strokes, expresses the skeletonized character strokes in a chain of direction codes, processes these chained direction codes to extract the topological features of a character, such as loops and the inclinations of limbs, and finally recognizes an input character by comparison with a standard pattern.
The method of this reference recognizes a character by encoding various points on the character background by reference to the frequency at which scanning lines extending from the points in every direction are intersected by character strokes, and comparing the coded points with those obtained from a standard pattern.
Many other character recognition methods have already been proposed. For instance, various character recognition methods are set forth on the pages 168 to 232 to the aforesaid book written by J. R. Ullman. These prior art methods use different processes in observing and treating a character pattern. Therefore, it sometimes happens that, some methods correctly recognize a given character, while the other methods reject it or wrongly read it. With another character, opposite results of observation are sometimes given by the conventional recognition methods, as naturally expected. U.S. Pat. No. 3,895,350 (Willem) describes an OCR of high recognition performance which comprises a plurality of scanning units using different scanning processes or a plurality of recognition units applying different recognition processes and wherein comparison is made between the results of observation to elevate recognition performance.
(3) U.S. Pat. No. 3,582,884 (David H. Shepard) discloses a character reader wherein a plurality of scanning units are connected to the corresponding expensive recognition units. According to this patent, a plurality of exclusive scanning units are connected to a character-reading center through the corresponding channels. Scanning of a document and transmission of a character pattern signal are carried out with the operation of the scanning units controlled by the center. A transmitted character pattern signal is further treated by a recognition unit disposed in the center and connected to a calculator.
Apart from a high speed OCR designed to read a large amount of data appearing, for example, on documents, an OCR which could read various kinds of characters with high precision would be very useful even in the case of treating a relatively small amount of data, though the processing speed might be low. Document data supplied to an OCR generally includes different kinds of characters (such as digits, English letters, square form of Japanese alphabet, printed or hand-written characters) and is further subject to different limits to wrong reading (recognition precision) according to the type of business in which documents are handled. This tendency is prominent particularly where a small number of documents of various forms are handled.
Therefore, the known techniques of reading characters have the following drawbacks from the point of view of reading characters by an inexpensive process.
(a) A system previously described under item (1) is applicable where an input character has a relatively good form. Otherwise, rejection of reading or wrong reading often arises to increase an operator's work, rendering an OCR little available for use or giving rise to higher operation cost.
(b) A system previously referred to under item (2) can indeed recognize characters with high precision, but presents problems from the economic standpoint, where only a small amount of data is supplied.
(c) A system previously discussed under item (3) comprises a recognition unit connected to a large number of scanning units. Therefore, it will be economically possible to provide a plurality of such recognition units capable of effecting considerably high recognition performance, as in the case of (2). In this case, however, character recognition operation is carried out uniformly with respect to all forms of input characters. Namely, the system fails to recognize characters in a manner adapted for the contents of a document, thus still raising problems in respect of cost. Further, the system must include a large number of terminals to be rendered economically feasible. Where, however, the terminals are distributed geographically over a broad area, then communication cost will increase, thus imposing limitations on the application of the system. If, in this case, the terminals and the corresponding centers were connected by, for example, the existing telephone switching network, then a large amount of image signals could be transmitted to the centers. Particularly, however, where these image signals are transmitted through plural stages of switching units, traffic unbalance will take place in said telephone switching network, exerting a harmful effect on communications between the other terminals, such as telephones or data terminals.
Any one of the known character recognition systems is capable of recognizing characters by the same process with high precision and at low cost and transmitting a recognized character to the center from the respective terminals. However, the prior art fails to recognize various forms of characters economically and efficiently by applying different recognition processes in compliance with the demand of a terminal user.
It is accordingly the object of this invention to provide a character recognition and communication system which is improved in respect of the drawbacks accompanying the prior art and can read various forms of characters with high precision and at low cost.