Although print media continues to be digitized and made available in electronic media, a large portion of print media remains disconnected from related and useful electronic data. By and large, one significant disconnect can occur when information is embodied as electronic image data, since before the image data can be of any valuable use in an electronic environment the pixel data contained within the image data must be identified, logically grouped into related segments, and associated with meaningful electronic data structures that are recognized within the electronic environment having electronic media.
For example, handwritten data residing on a printed media is of little use if scanned into an electronic environment and only represented as a single electronic image that consists of a plurality of pixels. However, if the pixel data is logically grouped into segments and processed by an optical character recognition (OCR) set of executable instructions, operating within the electronic environment, then the electronic image becomes a series of electronic character data structures which can be integrated and linked to other electronic media within the electronic environment. These electronic characters can then be further integrated and processed by additional executable instructions within the electronic environment to provide an integrated use for the handwritten data. For example, the handwritten data could be loaded into a word processor or email, indexed or stored in a data store for later retrieval, or linked with other valuable electronic data related to the handwritten data.
Yet, even if pixel data is properly translated into a useful electronic media format, the translated format can still be difficult to retrieve when a request for the original captured electronic image is later made within the electronic environment by using the original print media as a search request. This is so, because unless a proper identifier or tag, associated with the print media, is obtained for the original print media used as a search request, then a search to retrieve the desired electronic image will fail. For example, if an electronic image is initially scanned, translated, indexed, and stored in a data store, and a subsequent request for the scanned, translated, and indexed electronic image is made using the original print media, which represents the electronic image, then unless a proper identifier or key is associated with the original print media, a retrieval request will be unable to properly locate the translated and indexed electronic image within the electronic environment.
Some existing techniques attempt to uniquely identify or tag electronic images within the electronic environment by manually placing an electronic bar code label on the original print media, and in this way when a subsequent request to retrieve the electronic image is made, a scan of the bar code label on the print media results in a unique identifier that can then be used to properly retrieve the translated and indexed electronic image and any related electronic data. However, the bar code labels can become damaged and require manual intervention and maintenance. Furthermore, with the addition of a bar code label placed on the print media, the print media is altered to include a label permanently affixed to the print media.
Additionally, if the print media is associated with a plurality of printed pages assembled as a single document, further complicated techniques must be employed such that if a request is made for a certain page occurring after the first page of the document, the certain page can be properly retrieved, since often only the first page of the document will include a bar code label. Therefore, if the document includes a large number of printed pages, a request for a page occurring near the end of the document may result in the first page of the document being retrieved forcing a user to serially traverse a series of electronic images to locate the desired electronic page represented the desired printed page. As one of ordinary skill in the art will readily appreciate, bar coding techniques explicitly tag electronic image data by manually inserting a bar code label, and these techniques have a number of limitations and problems.
Other techniques to uniquely identify print media within an electronic environment require a special print media paper to be used, such that the special paper transparently includes a unique electronic identifier that is recognized when scanned into the electronic environment. These techniques are capable of uniquely identifying each page of a multi-paged document, but the techniques require users to buy and use a special paper for all print media scanned into the electronic environment. Still further techniques, use a handwritten signature affixed to a print media as a bar code like identifier. But, these techniques uniquely identify and retrieve a class of electronic images associated with a particular author and not a specific electronic image associated with the author. As a result, the user must filter through numerous retrieved and possibly unrelated electronic images to locate the desired image.
Furthermore, conventional techniques to translate pixel data have used OCR techniques when the print media is text data (e.g., alphabetic characters, numeric characters, or symbol characters) or image pattern matching techniques when the print media is related to graphical data (e.g., pictures, graphical symbols, shapes, and the like). Both techniques facilitate the translation of the scanned print media into meaningful electronic data structures, but neither technique addresses how the content of the print media can be uniquely identified and tagged for efficient indexing and retrieval within the electronic environment. To address this issue, some techniques will use the entire originally provided print media as a search request to retrieve the desired electronic image. Yet, providing the entire originally provided print media, as a search request, is often not feasible, is processor-intensive and memory-intensive, and is time consuming.
Accordingly, current pixel data indexing and retrieval techniques are not flexible enough to truly integrate print media within an electronic environment, which may have useful additional electronic data. Therefore, there exists a need for improved pixel data indexing and retrieval techniques.