1. Field of the Invention
The present invention relates to a data management system configured to store a job that a user has executed on a digital multifunction peripheral or a printer.
2. Description of the Related Art
In recent years, digital multifunction peripherals (MFPs) and printers have been widely used. Due to availability of MFPs and printers, users can easily print, copy, or send a document, regardless of their skills. The wide-spread use of digital MFPs and printers enhances the convenience of users, but also allows confidential documents to be easily printed, copied, and transmitted, which increases the risk of information leakage.
In this regard, a conventional document management system stores all processed job data (image data and text data) in performing processing such as printing, copying, and data transmission via facsimile or e-mail, in a server (a storage device). In such a system, an administrator can search a desired document from the stored data to trace what processing was performed by which apparatus on what date and time. Accordingly, if information leakage occurs, the leaked document can be searched to specify (trace) how the document was leaked. Thus, information leakage can be restrained.
Furthermore, the processed image data and text data can be stored associated with each other so that data stored in the server can be easily searched. A full-text search can be performed on the text data to search the job data. For example, a system auditor can perform a full-text search on the stored data using a term “confidential information” as a search term. As a result, job data including the search term can be obtained.
Job data (document data) stored in the server can be of various formats such as data for a print job, copy job, and facsimile job.
Japanese Patent Application Laid-Open No. 11-120202 discusses a method for inputting and centrally managing a plurality of documents of different data formats to enable a seamless search. In the method discussed by Japanese Patent Application Laid-Open No. 11-120202, documents of different data formats (such as an application document, a World Wide Web (WWW) document, and a facsimile document) are processed, a predetermined document structure file is generated from each document, and the generated document structure file is stored. The document structure file includes an original document file, a text file generated from the original document file, a thumbnail file, and a document management file for managing each document structure file. By using such document structure file, the method enables integrated management of a plurality of documents of different data formats.
In the method discussed in Japanese Patent Application Laid-Open No. 11-120202, the document structure file including the text file generated based on the original file is stored in a document structure file storage unit. Furthermore, one search text file generated by extracting only a text file from the document structure file stored in the document structure file storage unit is stored in a document management unit.
In the above-described method, when a text file is input as an original document, a document structure file including an original document file and a text file generated based on the original document is stored. Additionally, a text file for searching is generated and stored. Consequently, the text file is redundantly generated, and the same text file is duplicatedly stored.
In a system which stores a job that a user has executed on a digital MFP or a printer, when a printer prints application data, a printer driver can extract and store text data based on a text rendering command. However, in this method, a text cannot be extracted in the case of an application with which a text is printed as an image instead of printing a text using a text rendering command. Consequently, in a certain case, more appropriate text data can be obtained by performing optical character recognition (OCR) on the generated image data to extract a text.