The invention relates to methods and apparatus for simplifying data obtained by line-by-line electronic scanning of graphic images and the like, and more particularly to apparatus and methods for compacting the scanned pixel data and assembling it into a format that can be utilized for subsequent vectorization and/or character recognition operations with a minimum number of memory access operations to fetch data that is to be vectorized and/or subjected to character recognition operations.
Many techniques have been used for operating upon scanned image data, i.e., pixel data. Most techniques in common use include storing all of the pixels in the form of pixel codes that represent the darkness of color of a scanned point, each pixel code being stored in a memory location that corresponds to the location of the scanned point that produced that pixel code. Such techniques require a very large amount of memory to store all of the pixel codes representing an entire scanned document. To reduce the amount of memory required and the number of memory-intensive pixel manipuations required, various vectorization techniques have been developed, including so-called "line thinning" and "boundary tracing" vectorization techniques. Most, if not all, of these techniques have required extensive pixel manipulation, and hence have been slower than desirable and have required larger amounts of memory than desirable. Substantial loss of accuracy in the ultimately reproduced image also has been a major shortcoming of prior vectorization techniques. One prior art reference, "The Line Recognition of the Handwritten Schematics by Using Run-Length Information Only", by M. Okamoto and H. Okamoto, Faculty of Engineering, Shinshu University, Japan, describes a technique for line recognition of handwritten schematics wherein scanning of consecutive lines produces runlength data that is merged into "blocks" if certain connectivity conditions and other conditions are satisfied. The amount of direct pixel manipulation thereby is reduced. Simple breaks or insignificant gaps between pixels are eliminated.
In the Okamoto article, the images scanned are not operated upon from a "global" perspective. That is, the total image seen by a viewer is not simulated in its entirety. Instead, the image is fragmented into a large number of blocks or trapezoids which then are subjected to hundreds of "special case" rules which are applied in attempts to synthesize the trapezoids or pieces into vector data that represent the original objects in the scanned image. The vectorized data produced by this approach does not accurately preserve the topography of many shapes of scanned images, although it greatly reduces the amount of bit manipulation and memory required by most prior techniques for line-by-line scanning of graphic images.
There remains an unfulfilled need for an improved technique for greatly compacting scanned pixel data into linked lists that accurately represent scanned images and can be utilized for subsequent operations without extensive bit manipulation to locate image data in memory. The needed improved technique must be fast, accurate, and inexpensive, because systems utilizing most of the prior techniques are too expensive (costing more than $100,000.00), too slow, requiring an hour or more to vectorize or digitize a single page-sized complex drawing, and/or are inaccurate in that hours of human editing may be required to fix the inaccuracies resulting from the vectorization/digitizing techniques.