The number of paper and electronic documents that businesses must manage continues to grow. Many large corporations and other organizations maintain document management systems to handle the large volumes of documents they have to process. Such document management systems may, for example, transform documents from some set of formats (e.g., paper documents or books, e-book formats, emails, word processor output files) to others (e.g., searchable Portable Document Format (PDF) files), index the content of documents to facilitate searches, and store documents in various formats in a repository. For input documents that are not already in searchable electronic formats, optical character recognition tools may be invoked to extract the text content in some environments. In some cases, a given document (such as report or a contract) may go through several different versions before it is finalized, and several of the versions of it may be saved in the document management system over time. Regulations and laws may also impose requirements on how long certain types of documents have to be retained, further adding to the sizes and workloads of document repositories.
Traditional document management systems can often be resource-intensive, requiring substantial processing and storage capabilities. Furthermore, because of the large number of documents and the proliferation of different versions and formats in which the documents may have to be stored, the process of finding documents similar to a given document, or searching for specific text within a library of documents, can become expensive, especially if compute-intensive traditional optical character recognition techniques have to be used.
At the same time, during recent years, more and more business functions are being conducted (e.g., due to the quick response times needed for various business tasks, and/or due to the geographically dispersed workforces of many types of organizations) using mobile devices such as smart phones and tablets. Such devices are usually equipped with cameras capable of generating images (e.g., of business documents) of reasonable quality, and screens capable of displaying fairly high-resolution images. Although the imaging and processing capabilities, as well as the memory and storage capacities, of tablets and smart phones have improved rapidly, most such devices are still likely to become unresponsive if compute-intensive document management operations such as traditional character recognition or full-text comparisons are attempted on them for all but the simplest documents.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.