Several emerging applications are continuous in nature and the processing usually involves combining time-stamped inputs from sensors that are distributed in space with contextual data of the specific application environment. In a continuous application, upstream computational modules continuously generate data with a certain periodicity and pass the data to downstream modules for analysis and action.
FIG. 1 illustrates the software architecture of a prior art continuous application for processing timestamped inputs. An example of a continuous application is a Smart Kiosk, an interactive multimedia public user interface. The Smart Kiosk 100 interacts with customers in a natural, intuitive fashion using a variety of input and output devices such as, video cameras 102, 104, microphones 106, loudspeakers, touch screens 108, infrared and ultrasonic sensors.
The Smart Kiosk uses computer vision techniques to track, identify and recognize one or more customers in the field of view. The Smart Kiosk may initiate and conduct conversations with customers. Recognition of customer gestures and speech may be used for customer input. Synthetic emotive speaking faces and sophisticated graphics, in addition to Web-based information displays may be used for the Smart Kiosk's responses. The input analysis hierarchy 150 attempts to understand the environment immediately in front of the Smart Kiosk. At the lowest level, sensors provide regularly-paced streams of data, such as images at 30 frames per second from the video cameras 102, 104. In the quiescent state, a blob tracker 110 does simple repetitive image-differencing to detect activity in the field of view. When such an activity is detected, a color tracker 112 can be initiated. The color tracker 112 checks the color histogram of the interesting region of the image, to refine the hypothesis that an interesting object, for example, a human is in view. If successful, the color tracker can invoke higher-level analyzers such as a face detector 114 to detect faces and an articulated body detector 116 to detect human (articulated) bodies. Still higher-level analyzers such as a gaze detector 120 looks for gaze and a gesture detector 122 looks for gestures. Similar hierarchies can exist for audio and other input modalities, and these hierarchies can merge as multiple modalities are combined to further refine the understanding of the environment in front of the Smart Kiosk.
The parallel structure of applications such as the Smart Kiosk is highly dynamic. The environment in front of the Smart Kiosk, that is, the number of customers and their relative position, and the state of the Smart Kiosk's conversation with the customers affect which threads are running, their relative computational demands, and their relative priorities. For example, threads that are currently part of a conversation with a customer are more important than threads searching the background for more customers.
One problem in implementing an application such as the Smart Kiosk is memory management. FIG. 2 illustrates a simple vision pipeline 200 for the prior art continuous application shown in FIG. 1. The digitizer 202 produces digitized images every 30th of a second. The Low-fi tracker 206 and the Hi-fi tracker 208 analyze the frames 204 produced by the digitizer 202 for objects of interest and produce their respective tracking records 210, 212. The algorithmic complexity in the tracker modules 206, 208 usually prevents them from keeping up with the digitizer's rate of frame production. It is common to have a Low-fi tracker 206 that uses a heuristic such as color for tracking, to operate at about 15–20 frames/second and a Hi-fi tracker 208 that uses a more sophisticated algorithm such as face detection to operate at about 1–2 frames/sec. The decision module 214 combines the analysis of such lower level processing to produce a decision output 216 which drives the Graphical User Interface (“GUI”) 218 displayed on the display 220 that converses with the customer in front of the Smart Kiosk.