The present invention relates generally to methods and systems for linking documents and more specifically to methods and systems for linking document images with corresponding indexed data.
Currently, data-image linking is mainly a manual process that normally involves the use of manual interactive software tools that enable keyed or indexed data sets to be viewed, inspected, and linked or synched up with source images. The process often involves a digitized image being presented to a user along with corresponding keyed data in a sequential manner, requiring the user to verify sequential matches and make manual adjustments where either the data or images are off. There are several problems with current processes. For example, linking or synching image sets to keyed data sets in this manner is a tedious and error prone process.
It is true that in the simplest case where keyed or indexed data is produced from a given set of images, it should not be necessary to link the data to the appropriate images, at least as long as the indexing process is careful to maintain the proper association between the index data and the corresponding image from which that data was keyed. In practice, however, it often becomes advantageous and/or even necessary to link or sync-up indexed datasets to corresponding images.
In industries working with historical documents, which documents may span hundreds of years, the efforts to produce, duplicate, preserve, print, digitize, and the like, the historical documents has increased dramatically over time. Due to the efforts of numerous libraries, archives, and other organizations to preserve these documents from generation to generation, multiple copies of the documents usually exist. Furthermore, the documents often exist in multiple formats. The documents may have originally been handwritten on hand-drawn or machine-printed paper forms. The documents may then have been photographed or microfilmed/microfiched, and duplicated through any number of copies before being ultimately scanned or imaged (i.e., digitized) using a wide variety of modern digital imaging devices. Hence, there are typically many “sources” or copies of the documents or images for a given collection. The quality of the source image in terms of size, resolution, legibility, and the like can vary widely. Furthermore, the possibility of duplicate images, missing images, damaged images, and other such variations between image sets may lead to situations where even the count and sequence of images in these collections is inconsistent.
Meanwhile, multiple organizations continue to work to preserve and provide access to these collections. Therefore, in addition to multiple sources of documents or images, there are often multiple keyed or indexed “datasets” produced from various sets of images for a given collection. Consequently, there frequently exists a many-to-many relationship between “image-sets” and “datasets” for any given collection.