The invention relates to apparatus and methods for vectorizing serial data received from scanning a document, especially a document with line drawings, hand-drawn technical drawings, or the like.
Those skilled in the art recognize that a single sheet of a document has a large number of pixel-sized areas that can be scanned line-by-line by a typical optical scanning device. As many as 60 million bits of information may be required to represent every such area or "point" on a single sheet of paper. Various techniques have been proposed to reduce the amount of data that must be processed and stored to allow computerized manipulation of scanned images and electronic transmission thereof. To this end, various character recognition, pattern recognition, and vectorization techniques have been developed. One type of character or pattern recognition technique is template matching, typically found in optical character readers. This approach requires very close matching of the scanned character to a stored "template". Another approach involves the recognition of simple shapes and generation of vectors to represent those shapes. Another approach involves complex statistical feature extraction processes that are performed on scanned data and statistical comparison of such features with stored samples. All of the known techniques require extensive pixel manipulation, and cannot be accomplished with satisfactory accuracy and speed on a low cost computer, such as an IBM AT personal computer.
The state of the art is generally indicated by the following references.
The article "Line Recognition of Hand-Written Schematics Using Run-Length Data", by Masayuki Okamoto and Hiroyuki Okamoto discloses a method of vectorizing images from line drawings using only runs of connected black pixels. (The date of this article presently is unknown, and it is not known whether this reference is prior to the present invention.) Input vectors are compared to stored criteria to determine if the input vector is vertical, horizontal, or a 45 degree vector. Runs on consecutive rows are merged into a block if they are connected and satisfy certain conditions. The blocks then are recognized. A number of runlengths are gathered to compose one block according to the length, gradient between certain lines, and line connectivities. Line recognition is performed on blocks which can be recognized as lines according to gradients and shapes.
U.S. Pat. No. 4,307,377 (Pferd et al.) discloses a system in which computer graphics material is raster scanned, digitzed, and then examined for narrow width areas of similar darkness, typically black or white. Such runs are compactly coded by using the coordinates of the first and last scan lines defining each area and thickness.
U.S. Pat. No. 4,493,105 (Beall et al.) discloses a technique of determining x and y coordinates of uniquely defined corner point vectors about the edge contours of a binary-valued image and then uniquely linking such encoded corner points into associated lists that together define the image.
U.S. Pat. No. 4,545,067 (Juvin et al.) discloses an automatic image recognition system in which electronically scanned data is processed to determine the coordinates of characteristic contour lines of the image, segmenting the respective contour lines, encoding the segments, attributing to each a pair of values relating to its length and angle, and comparing those with a stored reference contour.
U.K. patent application No. 2131660A by H. Hashiyama et al. discloses a technique of storing data corresponding to characteristic points on contours of images to be recognized, reading out such data and subjecting it to magnification-changing processing such as enlargement, reduction, rotation processing, etc., and converting the thus-arranged pattern into one-dimensional time series data.
None of the known systems meet the existing need for a low cost, accurate scanner that can rapidly and accurately digitize documents such ad hand-drawn technical drawings and the like in times as short as about five minutes.