This invention relates generally to video editing systems, and more particularly to the composing of an image to be used in a video in a video editing system.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawings hereto: Copyright(copyright) 2000, Microsoft Corporation, All Rights Reserved.
Conventional video editing systems (VES) compose video from a number of different media types, such as video objects, text objects, and image objects. VES""s have a separate rendering subsystem for each of the media types, that renders the object and then pass the object to a composition engine. The composition engine combines video objects, text objects and image objects into a combined image. After the composition engine combines all of the objects, it renders the combined image back to the composition space, a file, or a device. Because each of the components of the VES, such as the rendering subsystems, composition engine and composition space, are developed as separate pieces, the system is inherently closed, making it difficult to add, extend, or enhance the features, functions and/or the capabilities of the video editing system. More specifically, it is difficult to add a chart, texture, or some not-yet-thought-of control to a video.
Furthermore, each rendering subsystem supports an effects engine to change the look of an object at a specified or predetermined point in time. For example, an effect can be added to a piece of text to fade or scroll away when the life span of the text expires.
The composition space and the composition engine require position information and timing information for each object. Timing information specifies an amount of time that an object is displayed. The VES must also be able to save or embed this information so that the information can be edited at a future point in time.
Furthermore, there are no standards for the layout and position of objects in a video. The systems are closed, so all of the layout must be done within the tool itself. The layout cannot be machine generated or automated (or localized, version controlled, etc.)
In addition, each rendering subsystem supports an effects engine, the functionality of which is duplicated in browsers. This is problematic because the duplication of functionality requires additional disk space to store the software component, slows performance of the application, and adds complexity to the development of the software components that could reduce the quality of the software components.
Lastly, conventional VES""s are installed and executed locally on computers. Installing, deploying and maintaining VES""s on locally on computers is expensive because the installation, version control, and multiple system configurations are managed on numerous physically separate machines.
The above-mentioned shortcomings, disadvantages and problems are addressed by the present invention, which will be understood by reading and studying the following specification.
The present invention uses a browser as the composition space in a video editing system (VES), thereby providing a video editing system that is open and extensible. A standard format or language, such as hyper-text-markup-language (HTML), is used to define the image layout. Therefore, the present invention enables image composition using conventional HTML authoring tools system. Furthermore, in one embodiment, the present invention is implemented as an application service provider (ASP) service offered through the Internet.
In one aspect of the invention, a method for composing a video stream uses a standard composition environment as the composition space for image editing. The method includes initializing control of the timing of a display-language-renderer, such as an HTML browser, attaching a frame grabber service to the display-language-renderer, progressing the timing of the display-language-renderer, rendering an image from the display engine into a screen buffer including at least multi-media object, invoking the frame grabber service, and combining the image with at least one other image, yielding a video stream.
In another aspect of the invention, the method includes asserting control of the timing of a display-language-renderer, a display-language document, and a source of multimedia information. The method also includes removing external audio and video sources from the display-language document and attaching the external audio and video sources to a video compositor engine. Furthermore, the method includes attaching a frame grabber service to the display-language-renderer. Subsequently, the method includes progressing the timing of the display-language-renderer, the display-language document, and the multimedia information. Thereafter, an image is rendered into a screen buffer of the display-language-renderer from the document, and the multimedia information is composited with the rendered image and the multimedia information. Thereafter, the method includes invoking the frame grabber service, and combining the composited images with other images in a video stream.
In yet another aspect of the invention, an apparatus includes a display-language-renderer, that receives an HTML document. The renderer generates a composed image from the HTML document. The composed image is generated in a compositor of the renderer. The apparatus also includes a timing service attached to the renderer. The timing service is controlled by a video editing system. The apparatus also includes a video compressor that communicates with the renderer that receives the composed image from the renderer and combines the composed image with other composed images, yielding a video stream.
In still another aspect of the invention, an apparatus includes display-language-renderer that receives an HTML document. The renderer generates a composed image from the HTML document. The apparatus also includes a compositor that is external to the renderer. The compositor is coupled to the renderer. The compositor receives the composed image and multimedia data. The compositor has a timing service. The compositor timing service is attached to the multimedia resource and the HTML document.
A second timing service is attached to the compositor and the second timing service controlled by a VES.
When the second timing service is incremented by the VES, the compositor timing service that is attached to the multimedia resource and the HTML document, is thereby incremented, and the compositor generates a second image from the composed image and the resource.
The present invention is suitable for use by application service provider (ASP) systems and/or a web-based implementation of a video editing system where the rendering portion of the present invention is distributed among rendering components that are distributed across communication lines.
The present invention describes systems, clients, servers, methods, and computer-readable media of varying scope. In addition to the aspects and advantages of the present invention described in this summary, further aspects and advantages of the invention will become apparent by reference to the drawings and by reading the detailed description that follows.