Digital coding of graphic information is commonly called for in a wide variety of contexts from facsimile data transmission to computerized photograph analysis and pattern recognition, to computer-aided-design applications. The first step in such digitizing is to scan the document in a controlled fashion, measuring the graphic value of the image at each point. Currently available scanning devices are capable of substantially simultaneously delivering a binary output signal for each of n lines of resolution cells, each cell being approximately 0.01 mm square. Thus a one meter long scan line of an engineering drawing for example would contain 10.sup.5 such resolution cells; a single square centimeter would contain 10.sup.6 resolution cells.
In practice there is a high degree of repetition of graphical values in any image, and accordingly with the extreme volume of digitized graphical values produced by scanning even the simplest images, it is necessary to employ coding techniques, or, more colloquially, pattern recognizing techniques, to reduce the required volume of stored or transmitted data. The simplest such technique is a one-dimensional information compression, such as run-length encoding, in which for each scan line a string length and starting coordinate are coded only when the value of a string of consecutive resolution cells changes. Where, as indicated above, the digitized information is in the form of raster output data from 0.01 mm resolution cells, a typical 80 character alphabetic line might then be coded as approximately 200 information signals for each 20 cm long scan line, a reduction of 99 percent compared to the 2.times.10.sup.4 bits of raw raster output data. When it is considered that a sheet of A4 paper contains 6.times.10.sup.8 such resolution cells, it can be seen that such a coding is still very cumbersome, requiring over a million information signals to code a single page of bi-tonal writing, scan line by scan line. This inefficiency is addressed in the prior art by a number of techniques which look for broader patterns by correlating the run length compressed data across a second dimension, typically by comparing contiguous adjacent scan line data and coding the difference.
Among such techniques are those shown in U.S. Pat. Nos. 3,937,871; 4,213,154; and 4,189,711. Another technique of two dimensional encoding involves designing circuitry specifically to efficiently recognize particular kinds of patterns encountered in a fixed use. U.S. Pat. No. 4,307,377, for example, shows a device which codes narrow straight lines, and which approximates narrow curved lines by a segmental approximation. That patent claims a 97 percent reduction in the amount of data required to be stored, although it is not clear what the base technique for such a comparison is.
Each of the above techniques, while offering a significant reduction in the amount of data required to be stored as compared to the raw raster output, has its drawbacks. Typically the coding techniques which correlate successive scan line intercepts require the coding of some data for each scan line intercept, and do not produce an output indexed to provide simple access for editing or for addressing portions of the stored image. The method shown in U.S. Pat. No. 4,307,377 avoids these problems for certain graphic elements, but does not offer as significant a data reduction in coding of images other than thin lines. As an example of the limitations of prior art, when applied to a document containing only a vertically oriented black isosceles triangle, centered on the page, none of the foregoing coding techniques would give an output as compact as the intuitive mathematical description comprising two linear equations and the top and bottom y-coordinates; nor would the coding output of such prior art devices indicate that such a simple image was being scanned. What is needed is a single device which quickly recognizes patterns and compactly codes the information contained in general two dimensional drawings containing both line drawings and shaded image portions, and which develops an output useful for the extraction of higher level information such as written characters.