The present invention relates to video presentation, and in particular to association of properties, such as hyperlinks, with video frames and portions thereof.
Hyperlinked video is video in which specific objects are made selectable by some form of user interface, and the user""s interactions with these objects modify the presentation of the video. Users of the World Wide Web have become accustomed to hyperlinks, in which xe2x80x9cclickingxe2x80x9d on a word or graphic selects another page, or perhaps modifies the current one. The idea of hyperlinked video, in which objects are selectable, has often been discussed as a desirable possibilityxe2x80x94implementing, for example, a fashion program in which clicking on an article of apparel provides information about it, or a nature program that allows children to go on xe2x80x9csafarixe2x80x9d and collect specimens. Creating such video has posed a challenge because of the tediousness of identifying the clickable regions in every frame manually, the difficulty of segmenting and tracking them automatically, and the need to painstakingly associate each clickable object with the linked information (e.g., an ID number, text string, or xe2x80x9cactionxe2x80x9d).
The problems of identification, segmentation, and tracking have recently been addressed using easily defined xe2x80x9csegmentation masksxe2x80x9d; see Bove et al., xe2x80x9cAdding Hyperlinks to Digital Television,xe2x80x9d Proc. 140th SMPTE Tech. Conf. (1998) (hereafter xe2x80x9cBove et al.xe2x80x9d); and Chalom et al., xe2x80x9cSegmentation of an Image Sequence Using Multi-Dimensional Image Attributes,xe2x80x9d Proc. ICIP-96 (1996) (hereafter xe2x80x9cChalom et al.xe2x80x9d). In accordance with this approach, during editing the author uses a computer mouse to scribble roughly on each desired object in a frame of video. In response, the system generates a filled-in segmentation mask for that frame and for temporally adjacent frames in the sequence, until the occurrence of a scene change or entry of new objects. These masks associate every pixel in every frame of the video to one of the author""s defined regions. The author may then associate each region with a particular action (e.g., a graphical overlay, display of information, switching to a different video data stream, or transmission of data on a back channel). The viewer of the video segment can select objects by any of a variety of means, including ordinary remote controls or remote controls with inertial sensors, or by means of a laser pointer aimed at a screen.
In operation, color, texture, motion, and position are employed for object segmentation and tracking of video. While the author is indicating regions of importance by scribbling with the mouse, the system uses a combination of these features to develop multi-modal statistical models for each region. The system then creates a segmentation mask by finding entire regions that are statistically similar and tracking them throughout a video scene. A scripting language for object-based media enables the association of the mask with the video frames, and the association of actions with selected regions at particular times.
Not addressed by this system, however, are the tasks of relating objects in different shots to each other and establishing the hyperlinks themselvesxe2x80x94i.e., the associations between each object in a segmented shot and the information specified by the link. In other words, while the segmentation approach allows objects to be defined within the sequence of frames constituting a shot or scene, it cannot relate those objects to the objects in a different scene. The user of such a system would be forced to manually identify each object in each scenexe2x80x94a highly redundant, time-consuming chore, but a necessary one if hyperlinks are to remain associated with the same objects from scene to scene.
The present invention automates the process of identifying and associating information with objects defined in a video sequence. In particular, a system in accordance with the invention creates an accessible list of object information, including semantic representations, which updates with the identification of new objects. Because objects appear in more than one shot in many video sequences, the system makes guesses about the identification of objects in a newly segmented sequence. If it guesses the object correctly, the author is relieved of the need to manually search a database of object information to make the association.
In a first aspect, therefore, the invention comprises a method of identifying objects in a new, as-yet unclassified video frame based on previously identifed objects. First, the video frame is analyzed to locate objects therein (preferably using the segmentation-mask approach). Located objects are modeled in terms of probability density functions (xe2x80x9cPDFsxe2x80x9d) with respect to one or more features of the objectxe2x80x94for example, color (which may be represented as luminance and chrominance parameters), texture, motion from frame to frame, and position within the current frame. The previously identified objects are similarly modeled as PDFs with respect to one or more features thereof, and the invention performs a comparison among PDFs to locate previously identified objects corresponding to the new objects. If such corresponding objects are successfully found, their identities are assigned to the new objects. To facilitate fast and accurate comparison, the PDFs associated with new and previously identified objects may be organized hierarchically, with higher-level PDFs representing composites of the PDFs associated with individual object occurrences.
In another aspect, the invention utilizes a database to organize various values of feature parameters associated with the objects in order to assist in classifying a new, as-yet unidentified object. Again, using color as an exemplary feature, the database may be organized into sets of data xe2x80x9cbinsxe2x80x9d corresponding to values (or value ranges) for the parameters according to which color is modeled. Color in digital video is ordinarily represented by three components according to any of various color coordinate systems (sometimes called xe2x80x9ccolor spacesxe2x80x9d); in accordance herewith, each such component represents a separate feature parameterxe2x80x94i.e., one of the constituents used to model the feature xe2x80x9ccolor.xe2x80x9d Thus, a separate set of data bins is defined for each color component, and the individual bins of a set represent defined values of the component. Additional sets of bins can be established for other features (texture, motion, position, etc.) useful in distinguishing among objects.
Once an object is segmented, it is analyzed to derive overall values for the feature parameters for which bin sets have been defined. The object is then associated with bins having values that accord with the values found for the object; for example, a pointer to the object may be placed in each bin having a value matching (or closest to) the corresponding feature-parameter value of the object, or having a value range that encompasses this feature-parameter value. This scheme facilitates rapid index searching to identify previously classified objects as candidate matches to a new object. When a new object is encountered, its feature parameters are derived, and previously identified objects within the bins indexed by the new object""s feature parameters represent the strongest possibilities for a matchxe2x80x94particularly objects commonly within all bins indexed by the new object. In general, however, exact matches are not sought. Rather, the bins having a selected proximity to (i.e., within a defined number of bins of) the bins indexed by the new object are considered, and previously identified objects commonly within this allowed range of bins for each bin set are considered the strongest possibilities for a match. This technique is useful as a supplement to the PDF search method discussed above, or may be used alone or in conjunction with other search methodologies.
In still another aspect of the invention, the space of possible object matches is narrowed by organizing an xe2x80x9coccurrence databasexe2x80x9d that tracks which objects co-occur within a frame. This approach usefully supplements searches based on object features, since many disparate objects possess similar feature properties such as colors and textures; for example, a woman""s neck and armxe2x80x94which may desirably be represented by two separate objectsxe2x80x94will likely yield very similar visual information. The object-occurrence database of the present invention may keep track of which objects appear in frames with all other objects, and in how many frames each object appears. This information can be used to help choose among candidate objects so that objects that have appeared in frames or shots with the as-yet unclassified object are favored as possible matches. Once again, this technique can be used alone as a rough guide to possible object matches, or before or after application of another search technique to further narrow the possibilities.