1. Technical Field
The present disclosure relates generally to electronic document management, and more particularly, to generating, automatically without user intervention, unique document page identifiers from content within a selected page region.
2. Related Art
The creation, distribution, and management of information are core functions of business. Information or content can be presented in a variety of different ways, including word processing documents, spreadsheets, graphics, photographs, engineering drawings, architectural plans, and so forth. In electronic form, these are generally referred to as documents, and may be generated and manipulated by computer software applications that are specific thereto. The workflows of creating, reviewing, and/or editing electronic documents have evolved to accommodate the specific requirements of various fields, though the need for a device-independent, resolution-independent file format led to the widespread adoption of the Portable Document Format (PDF), amongst other competing formats. Accordingly, different platforms having a wide variety of operating systems, application programs, and processing and graphic display capabilities can be accommodated regardless of the particulars of the workflow.
The PDF standard is a combination of a number of technologies, including a simplified PostScript interpreter subsystem, a font embedding subsystem, and a storage subsystem. As those in the art will recognize, PostScript is a page description language for generating the layout and the graphics of a document. Further, per the requirements of the PDF storage subsystem, all elements of the document, including text, vector graphics, and raster (bitmap) graphics, collectively referred to herein as graphic elements, are encapsulated into a single file. The graphic elements are not encoded to a specific operating system, software application, or hardware, but are designed to be rendered in the same manner regardless of the specificities relating to the system writing or reading such data. The cross-platform capability of PDF aided in its widespread adoption, and is now a de facto document exchange standard. Although originally proprietary, PDF has been released as an open standard published by the International Organization for Standardization (ISO) as ISO/IEC 3200-1:2008. Currently, PDF is utilized to encode a wide variety of document types, including those composed largely of text, and those composed largely of vector and raster graphics. Because of its versatility and universality, files in the PDF format are often preferred over more particularized file formats of specific applications.
In technical fields such as engineering and architecture, one project typically involves multiple aspects with numerous professionals spanning a wide range of disciplines. The planning documents, e.g., drawings, are specific to each discipline, though a change in one aspect may require a corresponding change in another aspect, and so on. For example, in a building construction project, there may be one set of plans for the structural aspect, while there may be another set of plans for the heating/ventilation/air conditioning (HVAC) aspect, and another set of plans for plumbing, another set for electrical, etc. A high level of detail is necessary in the planning documents to accurately convey the specifications of the project so that it can be correctly implemented. Although the ability to zoom in and zoom out of an electronic document alleviates this issue to a certain degree, the size and the amount of information contained in any one page must nevertheless remain manageable while retaining all the necessary detail so that viewing, editing, and annotating do not require complicated inputs/interface manipulations. Thus, the contents are separated into multiple pages.
In a typical set of drawings, whether stored in a PDF or otherwise, a standard convention is utilized to present, in an organized fashion, header information such as the title, drawing number, project name/identifier, facility identifier and/or address, measurement units, and so forth. This convention is typically the title block, which is usually positioned at the same location on each of the pages in the document. While this header information is useful when viewing the particular page on which it is located, it is a part of the document content itself, and cannot be used by the viewing/editing application to catalog and organize the document. The extent of any metadata that is stored in connection with a page is oftentimes limited to the page number relative to the other pages in the document, without any further descriptors.
Adding such descriptive information to label each page is a painstaking, error-prone, and time-consuming process that requires human intervention. After visually searching for and ascertaining the desired header information from the contents of each page, conventional processes require the manual keying of the same into a form field via the user interface of the editing application. The added metadata could thereafter be used for subsequent searching and organization purposes. The aforementioned procedure was required regardless of whether a bookmark was being created for a particular location on a page or a label was being applied to the page. Although labeling/bookmarking a document having only one or two pages may be trivial, typical project planning documents span many tens to hundreds of pages. Furthermore, the information from multiple different parts of the page could be needed for generating precise descriptors. For such larger, more complex documents, the time necessary to complete this task can increase to several hours.
Therefore, there is a need in the art for methods to generate, automatically without user intervention, unique document page identifiers from content within a selected page region.