1. Technical Field
This disclosure relates to authoring, extracting and linking video objects and more particularly, to authoring video by defining objects of interest in the video.
2. Description of the Related Art
Multimedia information is often very complex, drawing on a number of sources and containing large amounts of data. To make the multimedia information usable, it is preferable to create relevant and appropriate subject matter by employing all of the sources available, for example, through the use of hyperlinks. This provides the user a way to navigate a multimedia document based on present needs. Thus, the user has the capability to extract and visualize relevant information without actually having to look at all the information present. This is especially useful for videos which have become very popular and are being generated at an ever increasing rate by a variety of sources such as defense/civilian satellites, scientific experiments, biomedical imaging, industrial inspections, home entertainment systems etc. Typically, in these applications, the use of video clips is needed along with other media forms like audio, text, images etc. For example, for an electronic manual, while describing the characteristics of a machine part, it may be appropriate to permit the user to view a video clip of a relevant subpart. In that clip, the subpart may be highlighted, which if clicked on takes the user either to some other relevant source of information or back to the original text.
It would be advantageous to create links between an object that is visible for a certain duration in the video and other related information. Also, the duration of a video clip might have several linked objects existing either simultaneously or in different time windows linking to different destinations based on the content. To be able to use this information in a meaningful way in conjunction with all the other media types, i.e. text, images, audio etc., it is important to segment and structure the video and to create appropriate links between objects in different sections of a video and pertinent information in other media forms.
In concept, this is related to that of hypertext. It offers users a path to follow based on the user""s interest and the content of the video. Just like a web page, at any instance, several static and dynamic links can be available simultaneously within the video space. There is however, one crucial difference, unlike a web page, the link opportunities only exist in a fixed temporal window which disappears after the object of interest disappears, unless of course, the user stops the video player, rewinds and plays it again, in which case the link opportunities reappear. In other words, links in these cases have an extra dimension, that of time.
As mentioned above, the concept of hyperlinked video or hypervideo originated out of hyperlinked text or hypertext. Early work in this genre includes, for example, Storyspace, described in J. D. Bolter, Writing Space: The Computer, Hypertext and the History of Writing, Lawrence Earlbaum and Associates, Hillsdale, N.J. 1991, a hypertext writing environment from Eastgate Systems that employs a spatial metaphor in displaying links and nodes. Users create writing spaces, or containers for text and images, which are then linked to other writing spaces. The writing spaces form a hierarchical structure that users can visually manipulate and reorganize. Synthesis, described in C. Potts, et al. xe2x80x9cCollaborative pre-writing with a video based group working memoryxe2x80x9d, Tech-Report, Graphics Usability and Visualization Center, Georgia Institute of Technology, pp. 93-95, 1993, is a tool based on Storyspace and allows one to index and navigate analog video content associated with text in writing spaces. Synthesis may be used in the production of hypervideo in the design and prototyping stages. It provided an early demonstration for text to video linking. Video to video linking was first demonstrated in the hypermedia journal Elastic Charles, described in H. P. Brondmo et al. Creating and Viewing the Elastic Charlesxe2x80x94A Hypermedia Journal in Hypertext: State of the Art, Intellect, Oxford, UK, 1991, developed at the Interactive Cinema Group of the MIT media laboratory. Micons or miniaturized movie loops briefly appear to indicate video links. This prototype relied on analog video and laser disc technology requiring two screens. Today, digital video allows much more sophistication.
In the interactive Kon-Tiki Museum, described in G. Liestol, xe2x80x9cAesthetic and rhetorical aspects of linking video in hypermediaxe2x80x9d, Proc. Hypertext-94, ACM Press, New York, pp. 217-223, 1994, there is continuous linking present from video to text and video to video via the exchange of basic qualities between the media types. Time dependence was added to text and spatial simultaneity to video.
Videobook, as described in R. Ogawa et al., xe2x80x9cDesign strategies for scenario-based hypermedia: description of its structure, dynamics and stylexe2x80x9d, Proc. Hypertext-92, ACM Press, New York, pp. 71-80, 1992, demonstrated time based scenario-oriented hypermedia. Here, multimedia content was organized using a nodal representation and timer driven links were automatically activated to present the content, based on the time attributes. In L. Hardman et al., xe2x80x9cThe Amsterdam hypermedia model: Adding time and content to the dexter modelxe2x80x9d, Communications of the ACM, 37:50-62, 1995, they used timing to explicitly state the source and destination contexts when links were followed. In M. C. Buchanen et al., xe2x80x9cSpecifying temporal behavior in hypermedia documentsxe2x80x9d, Proc. Hypertext-92, ACM Press, New York, pages 71-80, 1992, the authors created hypermedia documents by manipulating temporal relationships among media elements at a high level, rather than as timings.
Vactive(trademark) from Ephyx Technologies and HotVideo(trademark) from International Business Machines allow a limited set of links so that upon user interaction, either another section of the same video or another video starts playing or a web browser is directed to a specified URL address. It allows for elementary tracking, thereby permitting one to track objects for simple motion where the object doesn""t change shape. However, the user has to go through the video and then find out the start and the end frames for such a tracking, and if there is a mistake, the user has to redraw the outline. These systems do not provide any way to semi-automatically organize the video, and the links permitted for use are limited. Authoring of these links needs to be done manually. This limits the flexibility and usability of these systems.
While in concept there is a similarity between hypertext and hypervideo, in terms of actual realization, several of the ideas need to be reformulated to accommodate the dynamic scope of video. Thus, the links need to be both temporal as well as spatial. And the authoring needs to encode this information. Similar is the case for navigating these links. Therefore, a need exits for a system and method for simplifying authoring of a video for hyperlinking wherein the user is not required to go through the entire video in order to identify objects of interest. A further need exists for interpolating capability between the start and end frames of the locations of the objects of interest to give precise location information without excessive computational overhead. A still further need exists for a motion analysis method to further break up the shots into subshots and use automatic hyperlinking to link the video clips or the objects thereof to different parts of a document system.
A method for authoring video documents includes the steps of inputting video data to be processed, segmenting the video data into shots by identifying breaks between the shots, subdividing the shots into subshots using motion analysis to provide location information of objects of interest undergoing motion, describing boundaries for the objects of interest in the video data such that the objects of interest are represented by the boundaries in the shots and creating an anchorable information unit file based on the boundaries of the objects of interest such that objects of interest are used to identify portions of the video data.
In other methods of the present invention, the step of segmenting the video data may include the steps of defining time segments for the video data, computing metrics as time series data for the time segments, comparing the video data between the time segments and identifying abrupt and gradual changes between the time segments of the video data to define the shots. The step of subdividing the shots into subshots using a motion analysis may include the steps of estimating motion for objects of interest by computing optical flow, observing the motion of the objects of interest, computing an error between the estimated motion and the observed motion and if the error is above a threshold value, creating an extra node to further define the motion of the objects of interest. The motion analysis may include an affine transform. The step of describing boundaries for the objects of interest may include the steps of assigning object types to the objects of interest for each shot, the object types including vertices, interpolating corresponding vertices on object types between frames of shots to define one of a spline and a line such that the spline and the line define the motion of the objects of interest between the frames. The method may include the step of linking the objects of interest to other objects to provide an interactive video document. The step of linking the objects of interest to other objects to provide an interactive video document may include the step of providing automatic hyperlinking between the video document and the other documents. The automatic hyperlinking may be provided by a hyperlinker and may further include the step of providing link specifications processing, pattern matching, and link establishment between sources and destinations. The method may further include the step of providing keyframes for representing shots and subshots of the video data such that the keyframe representation is used to identify the objects of interest included in the shots and subshots. The video is preferably specified in a video AIU specification language which follows SGML syntax and may further include the step of defining syntax for the video specification.
Another method for authoring video documents includes the steps of providing capable of hyperlinking to objects included in a browser, interpolating boundaries of the object types between frames to define motions of the objects of interest and playing the video and displaying anchorable information units associated with the object types in the video to provide interactive objects of interest for linking the objects of interest with other media upon selection of one the of the objects of interest and objects of the other media.
In other methods, the other media may include one of audio, hypertext, stored information and video. The step of interpolating may include subdividing identified shots of the video into subshots using a motion analysis, the motion analysis may include the steps of estimating motion for the objects of interest by computing optical flow, observing the motion of the objects of interest, computing an error between the estimated motion and the observed motion and if the error is above a threshold value, creating an extra node to further define the motion of the objects of interest. The motion analysis may include an affine transform. The step of interpolating may include the steps of assigning object types to the objects of interest, the object types including vertices and interpolating corresponding vertices on object types between frames of shots of the video to define a spline such that the spline defines the motion of the objects of interest between the frames. The method may further include the step of linking the objects of interest to other objects to provide an interactive video document which preferably includes the step of providing automatic hyperlinking between the object of interest in the video and the objects of the other media. The automatic hyperlinking may be provided by a hyperlinker and may further include the step of providing link specifications processing, pattern matching, and link establishment between sources and destinations. The method may include the step of providing keyframes for representing shots and subshots of the video data such that the key frame representation is used to identify the objects of interest included in the shots and subshots. The video is preferably specified in a video AIU specification language which follows SGML syntax and may further include the step of defining syntax for the video specification.
A system for authoring and viewing videos includes a video editor for creating an anchorable information unit (AIU) file for objects of interest in a video, and a video device for playing the video, the video having the anchorable information unit file associated therewith, the AIU file including object types associated with objects of interest within the video. A browser is included for interacting with the objects of interest wherein playing the video and displaying the AIU associated with the video provides interactive objects of interest for linking the objects of interest with other media upon selection of one the of the objects of interest and objects of other media types, such as other videos, images, text documents, etc. The video editor includes means for interpolating vertices of the objects between frames to define motions of the objects of interest so that the objects of interest are tracked during video play. The video is preferably specified in a video AIU specification language which follows SGML syntax.
In other embodiments, the other media may include one of audio, hypertext, stored information and video. The means for interpolating may further include a processor for subdividing identified shots of the video into subshots using a motion analysis, the processor preferably including means for estimating motion for the objects of interest by computing optical flow, means for observing the motion of the objects of interest, means for computing an error between the estimated motion and the observed motion and if the error is above a threshold value, means for creating an extra node to further define the motion of the objects of interest. The motion analysis may include an affine transform. The means for interpolating further includes means for assigning object types to the objects of interest, the object types including vertices, the vertices between frames of shots of the video defining a spline such that the spline defines the motion of the objects of interest between the frames. In other words, each object is a type and the objects are defined by their vertices. The system preferably includes an automatic hyperlinker for automatically hyperlinking the objects of interest in the video and the objects of the other media. The automatic hyperlinker may provide link specifications processing, pattern matching, and link establishment between sources and destinations. The video device may include one of a disk player, a processor and a tape player. The system may further include an input device for selecting the objects of interest in the video. The browser preferably includes a processor. The system may further include a key frame for representing each shot; and subshots may be defined by boundary frames such that the shot and subshots are individually identifiable by the system. The keyframes may be employed for authoring video documents other than by viewing the entire video.
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.