1. Field of the Invention
The present invention is directed to a method and an apparatus for electronic archiving of the data stream output by a computer.
2. Description of the Related Art
Such a method and such an apparatus are disclosed by German Patent Document DE 4408327 A1. In computer-generated documents that contain both graphic as well as numerical and alphanumerical information, are thereby transmitted to an archiving system via an interface, for example a standard printer interface. Within the archiving system, the data belonging to the documents are stored long-term on bulk storage devices such as magnetic tapes, magnetic or optical storage disks or the like. This type of storage is increasingly replacing earlier procedures wherein documents or, respectively, originals present on paper were acquired with an optical scanner, the image obtained in this way was converted into electrical signals and then deposited in archives. The previously standard microfilming of documents is being increasingly replaced by this new technique when the originals are already present in the form of electronic signals or, respectively, computer data streams.
Electronic archiving systems of the type initially cited usually convert a data stream output by the computer into a data stream having a specific data format matched to the archive. In many applications, the data stream of the computer is matched to specific output systems, particularly to printers. Examples of such print data streams are the IPDS (Intelligent Printer Data Stream) format coined by IBM or the PCI (Printer Command Language) format coined by Hewlett Packard.
An archiving system converts these data streams onto a format that corresponds to the archiving system. The direct storing of the data output by the computer system thereby proves very disadvantageous because an extremely great number of system parameters, for example character fonts, would have to be stored as well. The reproduction of data stored in this way would then also prove very complex. Archiving systems therefore store the data on a pixel-oriented basis, for example in what is referred to as the TIF (Tagged Image File) format. Such a storing technique constructed point-by-point then opens up the possibility of reducing the data volume according to standard compression methods. The data volume of such compressed data is all the greater the more black-and-white transitions there are to be processed. On the other hand, printed pages are being increasingly designed in an optically more complex fashion. For example, gray scale grids more frequently form the background on forms in order to make these more optically attractive and make emphases more recognizable. The result of this development is that printed pages have a higher and higher memory requirement when archived.
In another known system, what is referred to as the COLD (Computer Output to Laser Disk) system, data for archiving are separately deposited as mainly graphic data and mainly encoded data (line data). In the expanded COLD method, raw data and resources are likewise separately deposited, and the entire printing process is simulated in the reproduction. This causes a complex resource management.
German Patent Document DE 195 15 981 A1 discloses a method for acquiring manually written documents wherein the documents are scanned and subsequently further-processed at picture screens while blanking out re-print information. Since the pre-print information are no longer available in the further-processing, this method is only suitable when the information printed on the original is still known or, respectively, available at the time of the further-processing. This method is therefore hardly suited for a long-term archiving system.
European Patent Document EP 654 746 A2 discloses a method for archiving forms that corresponds to the procedure with optical scanning of documents that was already initially mentioned. Blank forms are thereby scanned first and the data of the blank forms are deposited in a computer. Filled-out forms to be archived are likewise scanned later and the data thereby acquired are compared to the stored data of the blank forms. To which blank form the filled-out form corresponds is investigated first, and the variable, filled-out data are then extracted from the filled-out form. The extracted data together with a reference to the data of the blank forms are then stored. In this method, the filled-out forms must be present in printed form so that they can be scanned and archived. For comparison, it is also necessary that the blank forms have already been scanned and stored before the archiving of a filled out form can ensue.
The publication of Wong, K. Y. et al., “Document Analysis System”, in IBM J. Res. Develop., Vol. 26, No. 6, November 1982, pages 647–656 describes a method for distinguishing between text data and graphics data. It is suitable for the manual processing of scanned documents but cannot be employed without further ado for the archiving of print data.