The continued growth and popularity of the Internet and company Intranets and Extranets as sources of information has resulted to information explosion to users. This has lead to a demand from users for find ways to easily search and verify the relevancy of the particular information they are looking for. Typically, when a user is looking for information from Internet on a particular subject he or she will use public search engines such as Google or Microsoft Live. In case of company Intranet or Extranet search, the user often uses company's internal search engine. The term user in this context can mean a human user who makes manual searches or a machine-based user that can be for instance a process that makes automatic searches after an alarm has occurred in an industrial process.
Generally speaking a search engine is a program that performs a search based on user search query (e.g. keyword(s) or phrase) and sends the results back to the user. These results typically include a listing of hyperlinks for the web pages or other documents produced by the search and additional information such as an excerpt of the text on the page, which relates to the keywords entered by the user for the search and the file type of the result document. Techniques, such as Boolean query language, may be used to create a search phrase and limit and narrow down the number of search hits.
After the initial keyword based relevancy matching done by the search engine the user must determine the relevancy and importance of the result document by reviewing the text excerpt from document presented in the result page of the search. In FIG. 1 there is a typical prior art search user interface including a search query area and a search results listing area with document titles, the file type descriptions and text excerpts. The larger the result set, the longer it takes the user to review the document titles and text excerpts of the search results.
In most of the cases to really ensure whether a document is an ideal match to the search query the user still has to view the original document. By clicking on a hyperlink (URL) and accessing the document resource with a web browser, the document content must be downloaded from the server to the client. If the document type is not supported by the web browser, an external viewer has to be launched to access the document. As a result of this process considerable amount of time is spent because the user typically has to download and review several documents before there is a good match for the original search query.
In some cases the search listing contains visual presentations (also known as thumbnails) of the web pages, still images and first/multiple frames of the video content. In case of Web (HTML) document thumbnails, the HTML pages are rendered into bitmap graphics and resized in order to create visual abstracts of the pages. It is well known to those skilled in the art that rendering means processing a document for visual representation. The rendering engine of the web browser essentially processes format instructions and converts them into graphical elements, determines the layout and calculates the overall appearance of the document. The thumbnail presentation may work fine for the web documents if the content length is sufficiently small, fitting into standard screen size and resolution. This content is then scaled according to thumbnail dimensions, providing a very high-level preview of the web page.
However, a single thumbnail presentation is not practical for documents containing multiple pages i.e. paginated content such as Microsoft Word, Microsoft PowerPoint or PDF documents. To ensure visual accuracy and re-production of original layout characteristics, the rendering and the original document should follow the document specific pagination as closely as possible. The process should produce previews from the original document following the pagination logic, creating at least one new representation for each page or slide of the document.
Besides identifying the document which matches the search query, it would also be helpful for the user to instantly see document pages that match the search query at the first glance in order to quickly determine whether the document is relevant for further investigation.