Typical optical character-recognition (OCR) systems comprise three major subsystems, namely: (a) Character Isolation, which handles paper detection, video scanner control, pre-recognition noise filtering, and the identification of character data fields suitable for recognition; and (b) Character Recognition, which attempts to identify the character described by a character data field provided by character isolation and may include some combination of mask correlation, feature analysis, decision trees, and other recognition techniques; and (c) Character Post Processing, which may include context and feature analysis, post recognition noise filtering, text spacing and skew adjustment, and communications.
The relative complexity of these subsystems will vary depending on the nature of the text being read and the accuracy desired. Early OCR systems for typed pages used OCR specific fonts such as OCR-A and allowed little or no random noise (dirt, copier marks, paper imperfections, etc.), line misalignment, or character touching. For these systems, simple isolation, recognition and post processing techniques yielded acceptable results. Later OCR systems for typed pages used improved recognition schemes to allow the use of standard fonts, such as Prestige Elite, but still were intolerant to noise, page misalignment and character touching. These latter systems were adequate for use with carefully prepared original documents, but were unacceptable when used with documents of uncontrolled quality or surface characteristics. In particular, character-isolation problems for these random documents (caused by skewed lines, touching characters, excessive smudging, and non-recognizable fonts) are often a greater source of error than character recognition problems.
The two schemes normally used for video data acquisition in an OCR system are either X-axis scan or Y-axis scan schemes. In an X-axis scan, a narrow 1-to-3 character high scan region (provided by an 80-to-150 photodiode element high .times.1 photodiode element wide array) is commonly swept across a page normally from left to right. Vertical picture elements or "pixel" columns are assembled in memory until a sufficiently wide data region, normally one to three characters, has been constructed. Characters are isolated within this region and passed to recognition. Scanning, isolation and recognition are normally performed as synchronous operations. Scanning and recognition schemes are described in the literature (see, for example, U.S. Pat. No. 4,379,282 and applications Ser. Nos. 452,494 and 470,241). The major advantage of the X-scan scheme is the relative low cost of the video array and memory required. A major disadvantage of this scheme is the cost and complexity of the horizontal shuttle required to create the X-axis scan. Another disadvantage is that uncertainties in isolation and recognition of characters must be resolved in a limited amount of time as the shuttle moves on to adjacent characters.
In a Y-axis scan, a long narrow horizontal scan region (provided by a 1600-to-2500 element wide .times.1 element high array) is commonly swept down a page. Horizontal pixel rows are built up in memory until an entire line of text has been stored, and isolation and recognition then operate on the line image stored in memory. The major advantage of this scheme is the elimination of an X-axis shuttle mechanism and the availability of an entire line of stored data to the isolation routine. One disadvantage of this scheme is the relative high cost of the array and memory required.
Most low cost OCR systems to data have employed the X-axis scan scheme due to its lower overall cost. Character isolation techniques for X-axis scanners must take into account such problems as line-skew correction and synchronized processing. In an ideal page, an X-axis scanner need only scan the height of a single line, advance a known distance to the next line, and scan the next line, etc., until the page is completely processed. Unfortunately, variances in inter-line spacing, page skew with respect to the scan array, and text skew with respect to the page edge, all force the use of a scan array which scans a region at least two rows high, in order to scan an entire line on one shuttle pass. Since data from at least two lines of text is available to the character-isolation subsystem, it is usually necessary to select and forward to recognition only those characters which comprise the current line. Typical schemes for identifying the current line include prescanning the entire line to determine the baselines and/or skew angles of the entire line and/or each word in the line. The line position information gained in such a prescan can then be used to identify the regions to be passed to isolation on a second pass. Current line identification schemes are subject to failure in the presence of underlines, "bowed" text lines, excessive skew, noise, and the like.
Also, the shuttle mechanisms used in low cost X-axis scanners may involve a typewriter-like carriage or a rotating drum holding the page or a hand-held wand. All of these shuttle mechanisms normally sweep from left to right across a page in a continuous rather than incremental fashion. In such a system character isolation and recognition normally proceed in synchronism with the shuttle mechanism, thus limiting the maximum time available for the isolation or recognition of problem characters. Sophisticated but desirable isolation and recognition techniques may be unusable because they require excessive operating time when implemented in software and excessive cost when implemented in hardware.
Recent reductions in video array and memory costs have made Y-axis scanners cost competitive with X-axis scanners in low cost OCR systems. Although the X-axis isolation techniques can be used in a Y-axis scanner, it is highly desirable to provide a low cost, high-speed isolation scheme suitable for use with a Y-axis scanner which obviates the aforementioned problems.