In the physical world, we can get the overall gist of a book by rapidly riffling through its pages. This task is even easier when the book is illustrated. The same technique can be used to locate a known target within the book, i.e., a page that a reader has seen before and is now trying to locate again. Since getting the gist of an offering and searching for a know target are tasks commonly encountered in electronic information applications as well, there is a need for techniques in the digital world similar to those of riffling the pages of a book.
The standard practice in electronic media is to present information statically on “pages.” Controls are provided to allow users to change to a different page, but not to flip rapidly forwards or backwards through a set of “pages.”
The closest activities that resemble the riffling of book pages is fast-forwarding or rewinding through a video or “surfing” through channels of television signals. However, these controls do not allow users to control the speed and direction of the presentation to maximum advantage and the methods are not generally available as a method for overviewing or targeting information other than video.
For years psychologists have studied human visual perception through a type of presentation known as rapid serial visual presentation (RSVP). It is known that humans can process briefly presented images extremely quickly. There is a long history of experiments investigating cognitive processes involved in reading and visual perception where images or text are flashed quickly. A recent edited volume of papers provides a summary and historical overview of this work, see Coltheart (Ed.), “Fleeting Memories: Cognition of Brief Visual Stimuli,” MIT Press, 1999.
It is believed that people process visual information in a series of brief discrete fixations of the eyes, typically in the range of 150 to 300 milliseconds. Between these fixations, saccade eye movements are rapid. Perception and comprehension of details are at the center of discrete fixations, whereas fuzzy perception at the periphery of vision is used in a process that determines the target of the next saccade.
In general, it is believed that visual perception progresses in stages that can lead to long-term retention in memory. However, it is possible for visual information to be seen and then quickly forgotten. Subsequent stages of cognitive processing leading to memory retention require resources that can interfere with visual perception and visa versa.
Of the prior art RSVP methods used in human-computer interfaces, the most basic uses a temporal sequence of single images that roughly corresponds to conditions studied in the psychology literature. Each successive image displaces a previously displayed image. That method of presentation has been referred to as slide-show or keyhole mode, see Tse et al., “Dynamic Key Frame Presentation Techniques for Augmenting Video Browsing,” Proceedings of the Working Conference on Advanced Visual Interfaces (AVI '98), pp. 185–194, 1998, and Spence et al. “Rapid, Serial and Visual: A Presentation Technique with Potential,” Information Visualization, 1, 1, pp. 13–19, 2002.
FIGS. 1–4 show other variations including carousel mode 100, see FIG. 1, dynamic collage mode 200, see FIG. 2, floating mode 300, see FIG. 3, and shelf mode 400, see FIG. 4. Those modes all use additional movement or displacement of the images.
To date there are only preliminary findings regarding the efficacy of RSVP methods in human-computer interfaces. It seems that the experiments thus far have simply confirmed that humans can extract visual information presented rapidly in slide-show mode. Tse et al. investigated fixed-rate slide-show methods for video browsing. Users were able to extract the gist of a movie, even when images were presented extremely rapidly, e.g., eight frames per second.
However, it has been hypothesized that the other RSVP methods might provide advantages by allowing the user more flexibility and control over their attention. The user could reject irrelevant images sooner, and focus longer on relevant images. However, preliminary experiments with more complex 2D spatial/temporal layouts, such as the carousel mode 100 and the dynamic collage mode 200 have not been able to show any advantage over slide-show mode. For example a pilot experiment comparing dynamic collage mode with slide-show mode is described in Wittenburg et al. “Browsing Through Rapid-Fire Imaging: Requirements and Industry Initiatives,” Proceedings of Electronic Imaging '2000: Internet Imaging, pp. 48–56, 2000. They describe an experiment involving tasks in Internet shopping where users had full control over the speed and direction of presentation. They compared the slide-show RSVP mode with the dynamic collage mode, as well as with a more conventional web page presentation method. In the dynamic collage mode, the images are placed successively and semi-randomly around a center point 201 until the images are occluded by subsequent images or are cleared from the display. Unlike the carousel mode 100, no image movement or scale changes are involved. Images of products were shown in three modes, slide show, dynamic collage, and web page. Users were asked to perform two tasks. The first was a gist extraction task. The second task was to determine the presence or absence of a target product. The relevant findings were that users preferred the slide-show mode over the dynamic collage and the web page mode, although no performance differences were observed.
The prior art methods, other than slide-show mode, require too much cognitive processing by the user since the user must attend simultaneously to many rapidly changing images and/or shift gaze to images at new 2D locations. Some of these methods require tracking in 2D, e.g., the carousel mode 100, and others require shifting gaze to focus to different locations, e.g., the dynamic collage mode 200. It should not be surprising that variants in which images move or in which images pop up at new locations require additional cognitive overhead.
De Bruijn et al. describe eye-tracking experiments connected with RSVP interface methods, see De Bruijn et al., “Patterns of Eye Gaze during Rapid Serial Visual Presentation,” Proceedings of the Working Conference on Advanced Visual Interfaces (AVI 2002), pp. 209–217, 2002. They compare a number of RSVP alternatives that require tracking, e.g., carousel, or focusing to different positions, e.g., dynamic collage. They make a number of observations regarding the patterns of eye gaze for the RSVP variant modes tested. They observed that different eye-tracking strategies were used for the shelf mode 400. One user seemed to focus only on the area in which new images appeared before moving off to background portions of the presentation. Another user seemed to track the images as they were moving. We hypothesize from these observations that new methods are needed to support users changing the focus of attention in order to adjust to their task (e.g., searching for a target image vs. extracting the gist of a sequence).
Therefore, it is desired to exploit human visual and cognitive capabilities to improve the presentation and browsing of electronic multimedia content.