With improved display technology, increasing mobile computing and data storage capabilities, the trend exists on increasing the amount of image information shown on physical and virtual displays of smaller and smaller size, such as handheld devices, cell phones, browser windows and thumbnails. Such a display of image information requires reformatting originally stored images, often called repurposing.
The increasing popularity of handheld devices, such as PDA and cell phones, with constrained display sizes demands a technology to represent documents in a proper way such that users are able to retrieve them effectively and efficiently. A big problem is that traditional thumbnails are created by only scaling the original documents into a small size, and do not consider the readability of the information contained in those documents. Thus, given a scanned document, resizing of the entire document to a target display (hereinafter referred as canvas) by downsampling often results in loss of readability of text and recognizability of image features.
There are many prior art techniques related to reformatting documents and images to fit the small display devices. Prior art includes Breuel's method that reflows a single page of a document image to a PDA by filling the content information into the width of the PDA display and allow scrolling the content in the vertical direction. For more information, see Breuel, “Paper to PDA,” Proceedings of IAPR 16th Int'l Conf. on Pattern Recognition, pp. 476-479, 11-15August, Quebec City, Canada. However, the technique is problematic for thumbnail creation for a number of reasons. First, the technique cannot be used for generating fixed size thumbnails since it may require scrolling through a document. Allowing scrolling means that no constraints are given to the vertical direction. As a consequence, all content can be shown and there is no need for selection of components. Second, this technique does not include results of a semantic document analysis.
On the other hand, some researchers have focused on representing semantic document information. Woodruf developed an enhanced thumbnail technology to locate keyword information, and paste color coded text rendered at a large point size onto the keyword location in the thumbnail changing the appearance of the original image segments in which keywords appeared. However, Woodruf did not solve the readability problem for image segments cropped out of the original document image containing keywords or sentences surrounding keywords. For more information, see Woodruf, “Using Thumbnails to Search the Web,” Proceedings of SIGCHI'2001, Mar. 31-Apr. 4, 2001, Seattle, Wash.
Based on Woodruf technology, Suh introduced an overview-plus-detail interface to present documents. With the interface, users are given both a downsampled document image as an overview, and pop-up windows as a detailed view zooming into the specific portions containing keywords. Keywords are also pasted onto the document overview image. Therefore, a transition from an overview representation to a detailed representation is achieved. In the overview of document images, the readability is not dealt with because with the help of detailed pop-up view, the user is able to read some portions of the document containing keywords. No reformatting of documents is performed. A problem with Suh's technology is that too many pop-ups clutter the display, and also it is not suitable for creating fixed-sized thumbnails. For more information, see Suh, “Popout Prism: Adding Perceptual Principles to Overview+Detail Document Interfaces,” Proceedings of CHI'2002, Minneapolis, Minn., Apr. 20-25, 2002.
Chen et al. extracted semantic information from document images by forming summary sentences based on a statistical model. For more information, see Chen et al., “Extraction of Indicative Summary Sentences from Imaged Documents,” Proceedings of ICDAR'97, Aug. 18-20, 1997, Ulm, Germany. Similarly Maderlechner et. al. extracted information from document images based on layout information. For more information, see Maderlechner et al., “Information Extraction from Document Images using Attention Based Layout Segmentation,” Proceedings of DLIA'99, Sep. 18, 1999, Bangalore, India. However, neither of their work can be used to generate thumbnails since these lack the ability to reformat image components into a constrained display.
A technology referred to as SmartNail technology was developed as a response to this problem. See U.S. patent application Ser. No. 10/435,300, entitled “Resolution Sensitive Layout of Document Regions,” filed May 9, 2003, assigned to the corporate assignee of the present invention, published Jul. 29, 2004 (Publication No. 20040145593). SmartNails use image based analysis, ignoring semantics such as document topics. Selected components in a document image are shrunk to a readable size depending on a visual importance measure without incorporation of any important semantic information extracted from the document.
In a document retrieval application, users usually submit queries, which are typically combinations of keywords, and wait for the documents to be displayed. From the perspective of information retrieval, users must make a quick judgment about the resulting documents on how relevant each is to users' queries. Therefore, pure image-based generation of thumbnails without incorporation of query information for documents is not enough.