The present application describes systems and techniques relating to document formats and format conversion.
Organizations are increasingly using machine-based networking technology to manage their relationships with clients and affiliates. These technologies are used by private enterprise and public institutions to effect many types of transactions that involve information collection and communication, typically using electronic documents. For example, an electronic form is an electronic document used to capture, present, transport, process and output information associated with a transaction. An electronic form is different than a paper form in that it includes not only aspects of visual presentation and information collection, but also processing rules to be used by a machine that handles the form.
Many electronic documents are in binary formats, which provide a great deal of flexibility in how information is represented in the document, but typically result in limited flexibility in how the information is accessed, including by human readers. Processing information from a document in a binary format typically requires the use of an Application Program Interface (API) to access the information, either to extract information of interest or to convert the binary format into another format. Format conversion typically lacks versatility and can result in a loss of information. Conversion filters are frequently lossy, even when they are reciprocal, because they frequently rely on a lowest common denominator document object model. Extracting information from a binary format document using an API adds another layer of programming complexity, and this added layer of complexity can be of particular concern when the binary format is a proprietary format that is defined and controlled by a particular vendor. In response to the difficulties created by binary formats, markup languages have been developed to allow document-based information to be shared and re-used across software applications and computer platforms in an open, vendor-neutral manner, and to make the electronic document more readily human readable.
Markup languages generally use text-based encoding schemes to represent information (e.g., ASCII or Unicode), markup rules to specify document structure, and metalanguage rules to specify document semantics. A commonly used markup language is Hypertext Markup Language (HTML). HTML forms are frequently used to enable initiation of network-based transactions; a user may fill out an HTML form available over the Internet using any Web browser. HTML is a standardized, non-proprietary markup language format that uses defined tags to specify document semantics. However, HTML does not generally separate tags relating to data type semantics from tags relating to data presentation semantics. Thus traditional HTML forms mix data collection information with data presentation information in the same document.
In contrast, XForms is a forms specification being developed by the World Wide Web Consortium (W3C) that uses Extensible Markup Language (XML) to separate form data into sections that describe what the form does (data type semantics stored in XML), and sections that describe how the form should look (data presentation semantics stored in XHTML (Extensible HTML)). When data in the form needs to change, a Web server may send new data to the Web browser in the XML format. When presentation of the form needs to change, the Web server may send a new XHTML document.
Additionally, the XML Forms Architecture (XFA) is a forms specification submitted to the W3C by JetForm Corporation. XFA builds on XForms by adding features that address the needs of organizations that use electronic forms, and the applications that process them. XFA includes an XFA-Template specification and an XFA-FormCalc specification. The XFA-Template specification describes open and extensible modeling of secure forms, including automated calculation and validation, pluggable user-interface components, and flexible data handling. The XFA-FormCalc specification describes a scripting language used in creating logic and calculations tailored to electronic-forms.
In general, an HTML document may be thought of as a single virtual page, regardless of whether the document is formatted as traditional HTML or XHTML. An HTML document generally has no ability to specify which portions of the document will appear on which physical pages when printed. In fact, a publisher of an HTML document has no final control over how the document will appear to an end user, because presentation of an HTML document, either by display on a monitor or by printing, is determined by the application that interprets the HTML tags.
In contrast, a final format document is an electronic document describing one or more virtual pages having a predetermined final format. The predetermined final format defines a specific visual appearance for the electronic document when displayed or printed. A final format document generally provides a device-independent and resolution-independent format for publishing and/or distributing electronic documents. Thus, a final format document defines an appearance of the document and can readily be stored as a file that lies between a layout program and the typical raster image processors, which drive traditional printing apparatus.
The final format generally allows a publisher to control the look and feel of the document as seen by an end user, including the specific physical page or pages on which information appears when printed or displayed. Thus, the final format should generally support and preserve all visual formatting features (e.g., fonts, graphics, color, etc.) of any source document, regardless of the source computer platform and/or software application used to create the source document. The ability to control final appearance, or look-and-feel, of an electronic document as viewed by a reader can be a critical branding issue for businesses and other publishing organizations, and is particularly useful when available across various computer platforms.
An example of a final format is the PORTABLE DOCUMENT FORMAT™ (PDF™) developed by Adobe Systems Incorporated of San Jose, Calif. PDF is an example of a binary format. Example software for creating and reading PDF documents is the ADOBE ACROBATS® software, also of Adobe Systems Incorporated. The ACROBATS® software is based on Adobe's POSTSCRIPT® technology which describes formatted pages of a document in a device-independent fashion.
Final format documents may be used in client-server environments. For example, on the World Wide Web, PDF documents are commonly used to present various types of information to, and also to obtain information from, Web users. Adobe Systems Incorporated provides a PDF reader tool, including a plug-in to Web browsers, for free. This allows anyone to read the binary format after installing the PDF reader tool and encourages use of the PDF final format. A final format document may represent a form, which users can fill in and/or print out, or which a server can modify to include data specific to a user. For example, Forms Data Format (FDF) is a data representation format developed by Adobe Systems Incorporated to allow importing of data into an existing PDF document. FDF files may be used to submit data to a server, or to receive data from a server. FDF Toolkit is an API developed by Adobe Systems Incorporated to facilitate the writing of server applications to generate and/or parse FDF data from a form created by the Adobe ACROBAT™ Forms plug-in.