1. Field of the Invention
The present invention relates to techniques for representing and manipulating documents within a computer system. More specifically, the present invention relates to a method and an apparatus for storing document-wide structure information within document components (such as pages of the document), which contain content items for the document.
2. Related Art
Adobe® Portable Document Format (PDF) is a publicly available specification used by millions of people worldwide to facilitate secure and reliable distribution of electronic documents. Adobe PDF is widely used by enterprises and governments to effectively streamline document management, increase productivity, and reduce reliance on paper.
PDF files can contain a number of different types of information. In particular, a PDF file can contain: (1) the page content itself; and (2) the structure information that describes the logical organization of content that appears on the pages. This structure information is presently represented as a “structure tree,” which is a single, document-wide data structure that represents the logical structure of the entire document, and which is stored separately from the page contents. (Note that some structure information can also appear on the individual pages.) Within the document pages, content items related to the structure tree are stored in “marked content containers,” which wrap content items and facilitate connections between the structure tree and page content through the use of marked content ids.
This separation of the structure tree from page content allows the ordering and nesting of logical elements to be entirely independent of the order and location of content items on document pages. However, when documents are assembled or disassembled at the page level (that is, pages are added or extracted), there is presently no way to attach or join the structure information. Consequently, the associated structure information is typically lost.
Hence, what is needed is a method and an apparatus that facilitates adding or extracting pages from a document, such as a PDF document, without losing the associated structure information.