Often, it is desired to reproduce a book that has been taken out of print and for which printing plates are no longer available. One way to effectuate this result is to photograph the printed pages and use the resulting film images to create new printing plates. This method has the disadvantage of introducing noise into the reproduction process that in turn degrades the quality of the reproduced pages. Other reproduction methods, such as photocopying, result in pages of even poorer quality, and hence are not acceptable under most circumstances.
A still further reproduction method relies upon the use of optical character recognition (OCR) techniques wherein the pages to be reproduced are electronically scanned to develop an electronic file representing the characters on the page. Modern OCR techniques, however, cannot process nontext images, are limited in their recognition capability and require knowledge of the font in which the page characters are printed in order for sufficient accuracy to be obtained. This OCR reproduction method is thus restricted to those books or other printed material utilizing fonts that can be recognized. Such a restriction severely limits the types of source materials that can be reproduced. In addition, such a reproduction method does not retain information concerning the format or style of each page.
In recent years, page description languages (PDL's) like PostScript developed by Adobe Systems, Inc., of Mountain View, Calif., have been developed in an attempt to provide a standardized way of describing a printed page.
Methods and systems have been known for converting data expressed in a PDL into bitmap form. Typically, the PDL expresses page elements, such as images, line art or characters, as a series of shorthand expressions indicating the location of the page element and its appearance. The bitmap representation, on the other hand, comprises a series of digital values defining the page on a pixel-by-pixel basis. Such converters, otherwise known as raster image processors (RIP's), are used to drive printers or other output devices that do not include an interpreter for the page definition language.