With the increased ubiquity of wireless networks, mobile work, and personal mobile devices, more people browse and view web pages, photos, and even documents using small displays and limited input peripherals. One current solution for web page viewing using small displays is to design simpler, low-graphic versions of web pages. Photo browsing problems are also partially solved by simply showing a low resolution version of photos and giving the user the ability to zoom in and scroll particular areas of each photo.
Browsing and viewing documents, on the other hand, is a much more challenging problem. Documents may be multi-page, have a much higher resolution than photos (requiring much more zooming and scrolling at the user's side in order to observe the content), and have highly distributed information (e.g., focus points on a photo may be only a few people's faces or an object in focus where a typical document may contain many focus points, such as title, authors, abstract, figures, references). The problem with viewing and browsing documents is partially solved for desktop and laptop displays by the use of document viewers and browsers, such as Acrobat and MSWord. These allow zooming in a document, switching between document pages, and scrolling thumbnail overviews. Such highly interactive processes can be acceptable for desktop applications, but considering that mobile devices (e.g., phones and PDAs) have limited input peripherals, with limited input and even smaller displays, a better solution for document browsing and viewing is needed for document browsing on these devices.
Ricoh Innovations of Menlo Park, Calif. developed a technology referred to herein as SmartNail Technology. SmartNail Technology creates an alternative image representation adapted to given display size constraints. SmartNail processing may include three steps: (1) an image analysis step to locate image segments and attach a resolution and importance attribute to them, (2) a layout determination step to select visual content in the output thumbnail, and (3) a composition step to create the final SmartNail image via cropping, scaling, and pasting of selected image segments. The input, as well as the output of SmartNail processing, is a still image. All information processed during the three steps results in static visual information. For more information, see U.S. patent application Ser. No. 10/354,811, entitled “Reformatting Documents Using Document Analysis Information,” filed Jan. 29, 2003, published Jul. 29, 2004 (Publication No. US 2004/0146199 A1) and U.S. patent application Ser. No. 10/435,300, entitled “Resolution Sensitive Layout of Document Regions,” filed May 9, 2003, published Jul. 29, 2004 (Publication No. US 2004/0145593 A1).
Web page summarization, in general, is well-known in the prior art to provide a summary of a webpage. However, the techniques to perform web page summarization are heavily focused on text and usually does not introduce new channels (e.g., audio) that are not used in the original web page. Exceptions include where audio is used in browsing for blind people as is described below and in U.S. Pat. No. 6,249,808.
Maderlechner et al. discloses first surveying users for important document features, such as white space, letter height, etc and then developing an attention based document model where they automatically segment high attention regions of documents. They then highlight these regions (e.g., making these regions print darker and the other regions more transparent) to help the user browse documents more effectively. For more information, see Maderlechner et al., “Information Extraction from Document Images using Attention Based Layout Segmentation,” Proceedings of DLIA, pp. 216-219, 1999.
At least one technique in the prior art is for non-interactive picture browsing on mobile devices. This technique finds salient, face and text regions on a picture automatically and then uses zoom and pan motions on this picture to automatically provide close ups to the viewer. The method focuses on representing images such as photos, not document images. Thus, the method is image-based only, and does not involve an audio for image thumbnails. For more information, see Wang et al., “MobiPicture—Browsing Pictures on Mobile Devices,” ACM MM'03, Berkeley, November 2003 and Fan et al., “Visual Attention Based Image Browsing on Mobile Devices,” International Conference on Multimedia and Exp, vol. 1, pp. 53-56, Baltimore, Md., July 2003.
Conversion of documents to audio in the prior art mostly focuses on aiding visually impaired people. For example, Adobe provides a plug-in to Acrobat reader that synthesizes PDF documents to speech. For more information, see Adobe, PDF access for visually impaired, http://www.adobe.com/support/salesdocs/10446.htm. Guidelines are available on how to create an audiocassette from a document for blind or visually impaired people. As a general rule, information that is included in tables or picture captions is included in the audio cassette. Graphics in general should be omitted. For more information, see “Human Resources Toolbox,” Mobility International USA, 2002, www.miusa.org/publications/Hrtoolboxintro.htm. Some work has been done on developing a browser for blind and visually impaired users. One technique maps a graphical HTML document into a 3D virtual sound space environment, where non-speech auditory cures differentiate HTML documents. For more information, see Roth et al., “Auditory browser for blind and visually impaired users.” CHI'99, Pittsburgh, Pa., May 1999. In all the applications for blind or visually impaired users, the goal appears to be transforming as much information as possible into the audio channel without having necessarily constraints on the channel and giving up on the visually channel completely.
Other prior art techniques for use in conversion of messages includes U.S. Pat. No. 6,249,808, entitled “Wireless Delivery of Message Using Combination of Text and Voice,” issued Jun. 19, 2001. As described therein, in order for a user to receive a voicemail on a handheld device, a voicemail message is converted into a formatted audio voicemail message and formatted text message. The portion of the message that is converted to text fills the available screen on the handheld device, while the remainder of the message is set as audio.