This invention relates to a hardware system for compressing information taken from existing documents for entry into a data base. More particularly, the invention relates to a method and apparatus for the high speed processing of raster-scanned data to effectively reduce the data necessary to represent lines and characters of the original document prior to storage of a vectorized representation of the original document in the data base.
Computer-aided design and drafting is practiced extensively in the engineering and graphic arts; however, there exists an extensive archive of documents generated manually or otherwise in pictorial form, i.e., more suitable for human interpretation than interpretation by machine. The demand for conversion of such archives to machine script will grow as the transition to computer-based graphics progresses and becomes more complete.
Techniques for automatically converting drawings and other pictures to machine script for processing, storage or display are well known. One of the most efficient and compact machine-script data sets representative of a picture comprises vectors including data items representing spatial location of the vectors with respect to the original picture. Generally, techniques for converting a picture into such vectorial data fall into two categories, viz.: line-following and raster-to-vector conversion. Line-following schemes, while generating vectorial data directly, require large and expensive assemblies that are best suited for high production environments. Line-following is said to be advantageous because the original picture is used as the image memory, instead of a bit-map copy of the picture in the computer memory. A "bit-map" is a signal set in machine script representing a tessellation of small picture elements or pixels of the original document. Generally, line-following imaging systems having devices that can be directed randomly in two dimensions to detect and follow picture features are either expensive or slow. An example of the former comprises a device utilizing a scanning laser beam which is directed by moving mirrors, and having acousto-optical devices for detecting features of the picture. An example of the latter is an electromechanical device such as a plotter having a light sensor instead of a pen. In some implementations, an operator manually guides a carriage along a line to be acquired; a photosensor detects when the carriage is directly over a line and enables the system to store X and Y coordinates of the carriage. By moving the carriage on an irregular path over the line, the intersections of the path and the line are stored as end points of a string of vectors. A totally automatic line-following system must first scan the entire picture to locate lines and features, and maintain a data-storage bookkeeping system to preclude duplicate storage of data. Otherwise, an operator must locate lines and direct the process, line by line.
In raster-to-vector conversion systems, the original picture or its microfilm is scanned, e.g., optically, and the information thereon resolved into a bit-map. The optical characteristics of each pixel are used to control detection circuits that generate positionally defined signals of the bit-map. An advantage of raster-to-vector conversion systems is that raster-scan imaging devices are inexpensive and prevalent; however, this kind of system has commonly required storage of the entire image as a bit-map in a data store accessible by a computer, the computer then executing a program for converting the bit-map to a vectorial data set.
The storage of a bit-map copy of a picture requires a large data store. For example, a bit stream acquired from raster scanning an E-size drawing with a resolution of 0.1 millimeter comprises approximately 100 million bits of data. A "bit stream" means a sequence of electrical signals or pulses comprising a set of binary digits representing data in coded form wherein the significance of each bit is determined by its position in the sequence and its relation to other bits. Various data reduction algorithms based on information and coding theory have been utilized to achieve significant reduction in the storage requirement for scanned data. Unfortunately, however, the form of representation of data as coded messages generally lacks information necessary for reconstituting regular line drawings.
Previously, software has been used to process digitized drawings, before vectorization by grouping adjacent pixels into small arrays, and altering some of the pixels depending on the configuration. This required a lot of time and system resources.
It would be desirable to have a compact, high speed, lower cost hardware system with the capability of varying the raster lengths for various sized drawings while incorporating the benefits of the software approach programmable patterns through the utilization of readily available and inexpensive components such as TTL and CMOS devices.