1. Field of the Invention
The present invention relates to document management system that converts documents from different sources such as scanners, Fax machines and workstations into electronic documents and provides file management facility to manage the storage and retrieval of such documents.
2. Background of the invention
With the availability of computers and workstations, there was a desire to convert documents into electronic form for efficient editing, storage and management. Since the arrival of the Internet and E-commerce, such desire becomes more apparent. The tasks of converting and managing documents are very complex processes because documents always will be presented and exchanged in both forms: hard copy and electronic forms. To further complicate this matter, electronic documents may be generated from different platforms with different formats and presentation soft wares. A commonly used process is scanning hard-copy documents and storing the images as TIF files before converting to PDF format which provides additional system flexibility. PDF formats can be Image Only PDF formats that encapsulate the TIF files or PDF image and text formats that show both the image and provide access to the OCRed text. In general, all electronic documents produced either by scanning of hard copies or by electronic processes can be converted to PDF formats or any other formats by parsing processes, which place the document presentation data into a print stream and feed it through a parser or parsers. Each parser parses the document data and converts it into the format, which said parser is built for before sending the output data to a repository engine. Typically, PDF format is commonly used for complete document image presentation and XML format is useful for data processing and analysis. However, converting complete documents to a single presentation format inherits several drawbacks such as inflexibility and insufficiency: at minimum, document functionality must include presenting complete and comprehensive image presentation of contained information, providing data for analysis and facilitating avenue for correction or modification. TIF format provides compact storage of image data and viewers for TIF files are available in most workstation, but it is not suitable for data extraction and image modification. PDF is more flexible, it can provide both image and text presentation formats, but modification is difficult and requires additional software. A document may include several presentation zones, each possesses different characteristics, functionalities, and requirements, therefore it is desirable to covert each presentation zone to format that is most suitable for its requirements. The concept of converting different zones of a document into different formats can overcome the shortcomings of converting a complete document into one single format. With the growth of E-commerce across the Internet, users must be able use personal computers to access documents such as application or purchase order forms and modify said forms before submitting. Such documents typically contain static presentation zones including the merchant logo, instructions and general information and dynamic presentation zones where the users fill in other required information. It is advantageous to convert the static zones to image format such as TIF file that is most suitable for viewing only and the dynamic zones to text format such as XML or any other text editor that is best for editing and data extraction, some text editor is available in most personal computers or work station.