Collections of visual media files (e.g., images and video) are growing in size and are often stored in multiple locations. Media repositories may exist on local storage for mobile and desktop devices, dedicated network-attached storage (NAS), or on remote cloud services. It is particularly difficult to search media files. Whereas textual queries can be matched to text content of ordinary documents, an image or video does not include text that can be directly matched. In addition, because of the vast quantity of media files, a manual scan of the media file universe is generally not productive. Furthermore, brute force approaches, such as performing OCR on an entire image, does not necessarily capture critical characteristics that would be relevant to a search query.
Similarly, even though information is increasingly digitized, documents continue to be printed (e.g., for offline review). There is also a large amount of legacy information that is only available in paper form. Old printed matter tends to be damaged and is not amenable to traditional scanning techniques. Moreover, organizing printed documents is particularly difficult due to the large number of document types. For example, a spreadsheet and a map that are printed and subsequently scanned together require very different analysis for digitization and organization.