Computer based systems for editing and composing audio visual work have existed for over a decade. Recent systems convert audio visual signals from a diverse set of external sources such as camcorders, VCRs, DVDs, MPEG streams, digital satellite signals, streaming web video etc. into specific digital formats to save onto mass storage devices, such as hard disks, in anticipation of further processing.
Analog video signals are received by the computer for conversion one image (i.e. frame) at a time at various rates. For example, standard NTSC television video comes, in at a rate of 29.97 frames per second. Other standards support different frame rates as well as different frame sizes. Each one of these frames is converted into a digital representation of the frame and stored in a file containing a time sequential group of frames (video sequence or clip). A video sequence can be identified by the particular file name in which it is held or via some additional descriptive information (metadata). Metadata can be any data which relates to the individual frames in an audio-video sequence or the entire sequence itself. For example, the original “tape name”, comments, location information, even global positioning system data etc. can be stored with the resultant video sequence and used to help organize and make future editing decisions. A video editor may be giving a metadata reference to identify some particular footage useful for a specific program. Metadata can be embedded with the audio video frames providing additional information. Frame accurate “time codes” can be associated with individual frames within a clip to precisely identify the point where an event takes place. For example, tape name: SuperbowlXX, timecode 00:12:41:15 can be used to identify the exact point in the video clip where the kick off in Superbowl 20 occurs. See FIG. 1.
The video editor relies on visual, audio and metadata cues during the editing process to identify the exact points in time to make editing decisions. These decisions include trimming, deleting, positioning, adding effects, overlaying graphics, incorporating sound effects, etc. into the resultant video.
A common method used to help identify clips employs small reference pictures (or picture icons, aka picons) from the video. However, since North American television transmits a standard definition video signal at a rate of almost 30 frames every second, even short video clip of several seconds may contain 100s of frames. Clips that are several minutes in duration will contain 1000s of frames. In order to physically fit these pictorial frame representations within a clip to be presented on a computer display, only a small subset of the actual frames are shown. The example in FIG. 2 shows a 3 second clip consisting of 900 actual frames represented by only 6 frames (i.e. only one out of 150 frames are used). Typically, the user interface for video editing utilizes computer displays to represent media and their relative temporal positions within a timeline metaphor. Clips can be placed in a sequential fashion from left (earlier in time) to right (later in time) representing the flow of the particular story being told. For example, the following parts of the video story will typically be placed sequentially from left to right respectively on a timeline: Title, Introduction, Scenes 1, 2, 3 . . . N, Ending, and Credits. See FIG. 3. Note that spaces can also be present between clips. These spaces are typically filled with black video frames.
FIG. 3 represents a single track timeline where clips are simply arranged in tandem sequential order on the horizontal or X axis.
Current state of the art computer editing systems employ what is commonly known as a Preview window to provide feedback to the operator during the editing process. The Preview window displays the frame of video at the point of where the Scrubhead is located. Note that typical editing systems utilize a combined Scrubhead/Playhead control which serves a dual role; displaying the current position of the timeline during playback and the current position of the timeline for editing. Since we are describing editing systems which allow editing during playback, we shall split up these two functionalities such that the Scrubhead shall describe the current edit position while the Playhead shall describe the current point in time on the timeline where video is being output (or played) from the video editing system. In FIG. 3, if the Scrubhead is at position x, the Preview window will display a single frame from the Intro clip referenced at time=x. Similarly, if the Scrubhead is at position y, the Preview window will display a single frame from the Scene 2 clip. Typically, Preview windows are separate windows on the computer display interface. They are effective in providing single point feedback especially on single track timelines.
Modern video editing systems support multiple tracks consisting of video, audio, graphics, titles, effects etc. In a multi-track timeline paradigm, vertical or Y axis is used to represent layers of video, audio, graphics, titles, effects etc. clips. See FIG. 4.
In a multi-track timeline, different clips can be played at the same point in time using layering effects. For example, these effects include transitions, picture in picture, transparency, overlays etc. In FIG. 4, at time=a, video clip X may be transitioning to video clip Y. Common transitions include wipes, fades or complex 3D effects. At time=b, title A is placed in front of (i.e. overlaid) on top of video clip Y. At time=c, graphics A is placed in front of video clip Z. These are simple examples of the many overlay possibilities in a multi-track timeline. In addition to the visual clips demonstrated in FIG. 4, audio clips, metadata clips as well as virtual placeholder clips can be combined in a such a similar fashion.
Again, a Preview window is utilized to provide feedback to the operator in a multi-track timeline. However, since there can be multiple clips at any one point in time, the Preview window provides feedback consisting of the combined output. Using the example in FIG. 4, at time=a, the Preview window will provide a visual frame consisting of both video clip X and video clip Y part way through a transition. At time=b, the Preview window will display Title A overlaid on top of video clip Y. At time=c, the Preview window will display graphics A in front of video clip Z. FIG. 5 shows the Preview window at the above three points in time.
Although effective in providing “combined” feedback consisting of the sum of all the layers at a point in time, the Preview window does not provide precise information at each particular layer of the composite.
Accordingly, it is an object of this invention to provide better feedback for each individual layer at any single point in time on a timeline used in the field of video editing.