1. Field of the Invention
The present invention relates generally to an improved data processing system and, more specifically, to a computer implemented method, an apparatus, and a computer program product for form attachment metadata generation.
2. Description of the Related Art
A typical use of forms in the workplace involves attachment of supporting documents, such as images and productivity documents, as enclosures. Digital signatures are often applied to the entire document, including any attachments, to provide a secure and tamper-proof transaction record.
Currently, there is no insight into the information about the attachments that are stored within a forms document. The lack of information regarding the attachments can be a significant issue when thousands or millions of forms are involved. Each of the stored forms may contain one or more attachments. The forms may be stored within data repositories such as a content manager or a data base manager product.
Currently, the only means to obtain information that describes each attachment, such as the information that may form attachment metadata, is to write custom code in the application tier to programmatically extract each attachment upon form submission. The custom coding in the application tier is required to capture the attachment information prior to document storage into the data repository. Another choice may be to write a custom application to crawl through the repository subsequent to form submission, sequentially extracting and processing the form attachments, creating and establishing links between the extracted information and the identifiers, and/or other data within the form. An additional challenge with the crawl approach is the need to separately store the information into the database or repository. This methodology does not conform to a document-centric architecture. As another option, a user may manually open and inspect each form. The user would sequentially extract and examine each attachment with the associated program using appropriate tools, such as an image viewer for image attachments, for example. Previously, valuable information has been locked away inside binary and proprietary attachments enclosed within the forms documents. Additionally, to be considered valid extensible markup language (XML), these attachments are maintained in a “gzip” and “base64encoded” format, adding additional layers of abstraction to the original data.
The software application, “gzip” is used for file compression and refers to the GNU zip program. The program is a free replacement for a previously used compress program in Unix-based operating systems. The encoding method “base64encoded” converts binary data into American Standard Code for Information Interchange ASCII text, and vice versa, and is one of the methods used by multipurpose Internet mail extensions (MIME). Base64 divides each three bytes of the original data into four 6-bit units, which it represents as four 7-bit ASCII characters. This typically increases the original file by about a third of the original size.