Desktop search technology has gained a lot of interest in recent years. Standard technology behind desktop search engines is a text retrieval engine. That means text is extracted from files, this text indexed and a search of a query term against indexed terms is performed. Text is typically extracted without pagination information (this may be due to historical adaptation of web search technology to desktop files). In a typical search scenario, a list of matched results is returned, sorted in order of the score assigned by the search engine, alphabetically by file name or by application. With increasing amounts of files on a personal desktop, the lists of returned results can become less informative. Often the user asks the question “why was this document received” without getting an answer to this question through the representation of the search results.
One important class of documents includes paginated documents (i.e. formatted documents), represented by file formats such as “pdf” or “doc”. Those documents may have been created electronically or may be sent to the desktop or a local file storage system through a scanner. In this case, an optical character recognition (OCR) process may have to be performed in order to be able to create a searchable index.
Furthermore, navigation through multi-page documents that are displayed in a list as a result of a search query is currently not possible without selecting a document, opening up the application in which the document was written or formatted, and navigating through pages using the application controls, perhaps performing a second search for the same term inside the application. For a search engine embedded into a multi-function peripheral (MFP) that would mean that state-of-the-art document processing applications would have to be implemented in addition to a search engine.