This invention relates to user interfaces and interactive processing environments for video editing, and more particularly to an interactive processing environment for video object segmentation, tracking and encoding.
Graphical user interfaces having windows, buttons, dialogue boxes and menus are known, such as those available with the Apple Macintosh Operating System and the Microsoft Windows-based operating systems. The inventions disclosed herein relate to a graphical user interface adapted for video editing tasks, such as segmentation, tracking and encoding.
Segmentation is the division of an image into semantically meaningful non-overlapping regions. When segmenting video, the regions are referred to as video objects. Tracking is a method for identifying a video object across a series of video frames. Encoding is the compression and formatting of video according to some conventional or proprietary encoding scheme, such as the MPEG-4 encoding scheme.
According to the invention, a processing environment for video processing includes a user interface and processing shell from which various video processing xe2x80x98plug-inxe2x80x99 programs are executed. The user interface allows an operator to load a video sequence, define and view one or more video objects on any one or more of the frames of the video sequence, edit existing video object segmentations, view video objects across a series of video frames, and encode video objects among a video sequence in a desired format (e.g., MPEG-4 encoding). Various encoding parameters can be adjusted allowing the operator to view the video sequence encoded at the various parameter settings. One of the advantages of the processing environment is that an operator is able to do automatic segmentations across a sequence of video frames, rather than time consuming manual segmentations for each video frame.
According to one aspect of the invention, the user interface includes a main window from which subordinate windows are selectively displayed. Among the selectable subordinate windows are a video window, a time-line window, a zoom window, and an encoding window. The user interface also includes a set of menus including a menu of plug-in programs, and a set of dialogue boxes, including encoding parameter dialogue boxes. The video sequence is viewed and played in the video window using VCR-like controls. Video frames may be viewed in sequence or out of sequence (e.g., full motion video, stepping, or skipping around). The time-line window allows the operator to determine where within the sequence the current video frame is located.
According to another aspect of the invention, an operator may define an object by selecting a command button from the time-line window. The operator clicks on points in the video window to outline the portion of the displayed image which is to be the desired video object.
According to another aspect of this invention, the zoom window is concurrently active with the video window, while the operator defines the object. In particular, the pointing device cursor location is tracked concurrently in both the video window and the zoom window. Scrolling in the zoom window is automatic to track the pointing device cursor. One advantage of this is that the operator is able to view a location within the video frame, while also viewing a close-up of such location in the zoom window. This allows the operator to precisely place a point on a semantically-correct border of the object (e.g., at the border of an object being depicted in video). In some embodiments the zoom window shows a close-up of the pixels of the video window in the vicinity of the pointing device cursor.
According to another aspect of this invention, a segmentation plug-in program processes the video frame and selected outline to refine the object along semantical border lines of the object being depicted. The result is a video object.
According to another aspect of the invention, a defined video object is highlighted by one or more of the following schemes: overlaying a translucent mask which adds a user-selectable color shade to the object; outlining the object; viewing the rest of the frame in black and white, while viewing the object in color; altering the background to view the object alone against a solid (e.g., white, black, gray) background; applying one filter to the object and another filter to the background.
According to another aspect of the invention, an operator is able to select timepoints in the time-line window and a tracking algorithm from a plug-ins menu. The tracking algorithm identifies/extracts the defined object across a sequence of video frames. Thus, the operator is able to view the video sequence with highlighted object from a selected starting time point to a selected end time point. Alternatively, the operator may view just the video object (without the remaining portion of the video frames) from such selected starting to ending time points.
According to another aspect of the invention, the operator may step through the video frames from starting time point onward. The operator may stop or pause the stepping to adjust or redefine the video objects. An advantage of this aspect is that as the tracking algorithm begins to lose the ability to accurately track an object, the object can be redefined. For example, as some of the background begins to be included as part of the video object during tracking over multiple frames, the boundaries of the video object can be redefined. Further, the object can be redefined into one or more sub-objects, with each sub-object being tracked and displayed from frame to frame. An advantage of the plug-in interface is that a common or different segmentation plug-ins may be used to segment different objects. For example, one segmentation plug-in may be well adapted for segmenting objects in the presence of affine motion, while another segmentation plug-in is better where the object deforms. Each segmentation plug-in may be applied to an object for which it is most effective.
According to another aspect of the invention, the time-line window indicates which frames of a sequence have been processed to track/extract a video object.
According to another aspect of the invention, where sub-objects are being tracked the objects can be combined into a single object just before video encoding. The operator is able to select among a variety of encoding parameters, such as encoding bit rate, motion vector search range, and fidelity of the encoded shape.
According to another aspect of the invention, an encoding status of each object is displayed showing the peak signal to noise ratio for each color component of each frame and for the total number of bits encoded for each frame. An advantage of such display is that the operator is able to visualize how peak signal to noise ratio varies between video objects over a sequence of frames or how the total number of bits affects the peak signal to noise ratio of each color component of an object. When the image quality is unsatisfactory, these displays enable the operator to identify a parameter in need of adjusting to balance peak signal to noise ratio and the bit rate. For example, an operator is able to select a higher number of bits to encode one object and a lesser number of bits to encode another object to optimize image quality for a given number of bits.
According to an advantage of this invention, various processing needs can be met using differing plug-ins. According to another advantage of the invention, the processing shell provides isolation between the user interface and the plug-ins. Plug-ins do not directly access the video encoder. The plug-ins accomplish segmentation or tracking or another task by interfacing through an APIxe2x80x94application program interface module. For example, a segmentation plug-in defines an object and stores the pertinent data in a video object manager portion of the shell. The encoder retrieves the video objects from the video object manager. Similarly, plug-ins do not directly draw segmentations on the screen, but store them in a central location. A graphical user interface module of the user interface retrieves the data from central location and draws the objects in the video window. As a result, the various plug-ins are insulated from the intricacies of reading various file formats. Thus, data can even be captured from a camcorder or downloaded over a network through the user interface and shell, without regard for plug-in compatibilities.