1. Technical Field
The present invention relates generally to methods for electronic document revision tracking and control. More particularly, the present invention relates to a method for calibrating a modified document and an original document for difference identification.
2. Related Art
Advancements in high speed data communications and computing capabilities have increased the use of remote collaboration for conducting business. While real-time collaboration using videoconferencing and the like are gaining popularity, the vast majority of collaboration occurs over e-mail in the exchange documents incorporating incremental modifications, comments, and the like. A local user may create an initial version of a document, and transmit the same to remotely located colleagues. These remote users may then make their own changes or add comments in the form of annotations appended to the document, and then transmit the new version back to the local user.
Such collaboration may involve the exchange of documents generated with word processing applications, desktop publishing applications, illustration/graphical image manipulation applications, Computer Aided Design (CAD) applications, and so forth. As utilized herein, the term “document” may refer to data produced by any of the aforementioned software applications. Furthermore, the term “content” may refer to data particular to the software application that generated it and stored in the document of the same. Due to the existence of many different computing platforms having a wide variety of operating systems, application programs, and processing and graphic display capabilities, however, it has been recognized by those in the art that a device-independent, resolution-independent file format was necessary to facilitate such exchange. In response to this need, the Portable Document Format (PDF), amongst other competing formats, has been developed.
The PDF standard is a combination of a number of technologies, including a simplified PostScript interpreter subsystem, a font embedding subsystem, and a storage subsystem. As those in the art will recognize, PostScript is a page description language for generating the layout and the graphics of a document. Further, per the requirements of the PDF storage subsystem, all elements of the document, including text, vector graphics, and raster (bitmap) graphics, collectively referred to herein as graphic elements, are encapsulated into a single file. The graphic elements are not encoded to a specific operating system, software application, or hardware, but are designed to be rendered in the same manner regardless of the specificities relating to the system writing or reading such data. The cross-platform capability of PDF aided in its widespread adoption, and is now a de facto document exchange standard. Currently, PDF is utilized to encode a wide variety of document types, including those composed largely of text, and those composed largely of vector and raster graphics. Due to its versatility and universality, files in the PDF format are often preferred over more particularized file formats of specific applications. As such, documents are frequently converted to the PDF format.
The exchange of documents according to the workflow described above may take place numerous times, with the content of the document evolving over time. For example, in various engineering projects utilizing CAD drawings such as in architecture or product design, a first revision of the document may include only a basic outline or schematic. Subsequent revisions may be generated for review and approval as further features or details are added prior to construction or production. On a more extended timeline, multiple iterations of designs may be produced. In another example, an author or a graphics designer may produce an initial draft of a document, with editors and reviewers adding comments or otherwise marking the document and resubmitting it to the author or graphics designer. The changes are incorporated into a subsequent version. While in some instances the review and approval process is performed directly on the electronic document, there are many instances where a printed hard copy of the document is utilized. As such, the reviewer may annotate, comment upon, edit, or otherwise supplement with information directly upon the hard copy of the document.
When it is necessary to send the printed copy of the document to another electronically, a scanner is typically utilized to capture the document. More particularly, the scanner converts an “analog” image, which consists of continuous features such as lines and areas of color, to a digitized encoding that represents the analog image. A rasterized image, or a bitmap, is generated comprising rows and columns of pixels, with each pixel representing one point in the image. Separately viewed, the pixel does not convey useful visual information, but when the entire field of pixels is viewed at an appropriate distance, a facsimile of the analog image can be recognized. As is generally known, each pixel is represented by luminance strengths of primary colors. Digital representation typically uses the RGB (Red Green Blue) color space, while print typically uses the CMYK (Cyan, Magenta, Yellow, Black) color space.
In acquiring the digital image, some distortion with respect to scale and rotation may be introduced. A correction filter may be applied to the data, though this can correct distortions only to a certain degree. Additionally, correction filters may also attempt to correct distortions introduced during the analog-to-digital conversion process. Due to the existence of numerous other variables that affect the capture and conversion of images, acquiring an exact digital replica of the printed copy is difficult.
During collaboration, it is often desirable to review earlier versions of a document and comparing the same to a current version of the document. By doing so, the evolution of the content may be better appreciated, and each change made to the content may be tracked for approval and other purposes. Various techniques exist for emphasizing differences, but each such technique requires that the two documents being compared be properly aligned. Otherwise, unchanged portions of the document may be identified as being different, when it is only pixel noise, rotation, scale, offset or other like distortion that is different. Where one version of the document is generated directly from the application and another version of the document is scanned from a printed copy, either or both of the documents may be distorted.
Accordingly, there is a need in the art for a method for aligning a modified document and an original document where such documents are being compared to accentuate differences therebetween. There is a need for automatically aligning the documents and minimizing the distortions of the documents so that a comparison tool does not generate false positives.