With the proliferation of digital video camcorders (DVCs), there is a growth in the number of DVC users who wish to edit and process their captured video images, and also to communicate the product of their efforts to others. DVCs can capture video data, audio data, and in some cases, still images as well. These various types of data are referred to in this specification as “media data” and items of data are referred to as “media items” or “media clips” or the like. Technological elements associated with this editing activity include the editing interface, hardware accessories and editing software required, as well as communication hardware and software.
Digital Disc reCording devices (DDCs), ie digital video recording devices which utilise magnetic or magneto-optical discs (MODs) as a recording medium, offer even greater flexibility to users wishing to edit, process and communicate their media data, however greater flexibility typically exacerbates the problems facing enthusiastic yet technically untutored users.
A generic DDC system architecture comprises a number of different functional and/or structural elements. One such element is communication, both between intra-system elements, and between the system and external devices and data. Another element is infrastructure for supporting editing and incorporation of effects. A user wishing to effectively edit and process captured “media data” thus has a relatively complex system to manipulate.
Users who do edit media data typically wish to improve on the internal DDC recording, editing and effect-adding features, however very few consumers actually edit media data with software or specialised hardware. This derives in part from the fact that typical consumer users of digital video devices find it difficult to interconnect the various elements, both hardware and software, in a system architecture. This hinders the growth of DDC architectures, which inherently offer advantageous editing and processing capabilities. Furthermore, very few users attempt to gain skills in media data capture, editing or post-production. Even those consumers who do attempt to master the technology find that this is not enough, because media data editing and post-production is an art, and the required hardware and/or software is typically expensive.
Many basic editing techniques are time consuming and repetitive. Although software packages can provide assistance in the form of interactive GUIs, the tedium remains of acquiring familiarisation with the media and making edit decisions.
In some instances, a fully edited video may have tens, or hundreds of clips each clip having typically 25 to 30 frames per second. Even for a professional using high-end equipment, the task of editing such a video can take many hours or days. For a consumer video photographer, performance of this task is prohibitive in time, expensive in money terms and demanding in skill terms.
In a typical editing task, once selection of clips has been performed from the raw footage, the clips are placed in sequence. Current tools available for this process include software that provides a linear time-line, or alternatively, hardware such as dual Video Cassette Recorders (VCRs) for sequencing from the original source to another. This again is a time consuming task, involving manually slotting each video clip into place in the sequence. The current dual-VCR or camera-and-VCR solutions are slow and tediously technical for a consumer to control, and should the consumer want to amend any part of the video, the whole process must often be started again. Although some of the aforementioned process can be substituted by more capable hardware and software, the dual-VCR, or camera-and-VCR process is still used by many consumers.
Transitions such as dissolves or cross-fades are often beyond the capability of consumers' equipment unless they can use computer software. The actual implementation of video transitions and effects often places heavy storage and processing demands on editing computer resources, including requiring capture and export format decoding and encoding hardware attached to a consumer's computer. Consumer video photographers typically do not have a basic appreciation of the nature of transitions, or where they should be used. This typically results in incorrect or excessive use thereof, which constitutes a drain on resources, and results in less than pleasing results.
Consumers generally have high expectations of video because of the general availability of high-quality television programs. Home video production rarely comes close to the quality of professionally-made television programs, and this is evident in the disdain with which the general public generally holds home videos. It is very difficult for consumers to compete with the quality of professional television programs when producing their home videos. For instance, generating titles and correctly placing them in order to produce an entertaining result requires typographical and animation skills often lacking in consumers. It is also not fully appreciated that unprofessionally made titles often spoil the result of many hours of editing. Specialised software and/or additional title-generation resources are often required, thereby adding to the final cost of the production.
Current methods of sound editing are highly specialised, and the principles governing the process of embellishing a final edited rhythm with beat synchronisation is well beyond the scope of most consumer video makers. The time required to analyse the wave form of a chosen sound track in order to synchronise video cuts is prohibitive, and the cost of equipment is unjustified for most consumers. These techniques are typically unavailable in the dual-VCR editor context.
Video typically contains much content that is rarely if ever used, often being being viewed only once. Users typically capture more content than is ultimately of interest to them. Finding and viewing the content that is of interest can be carried out in various ways.
Considering an analog tape deck system, the user must shuttle through the linear tape, log the timecode of a frame sequence of interest, and/or record these segments to another tape. Logging timecode is generally only a practice of professional video editors. The practice generates log sheets, which constitute a record of locations of useful content on the tape. The case of Tape-to-digital capture is similar. Here, the user shuttles through the content marking the timecode via a keyboard and mouse using a computer software application. The interesting/useful segments are then digitised to a hard disk. It is apparent that in both above cases, the user makes a duplicate record of desired footage.
Once the content is used in an edited production, further trimming takes place. Should the user want to use the interesting content in another, different production, the analog tape deck system commands the user to carry out the same rewriting to tape process. Any content captured to disk requires that the user search through the files system, to find the relevant shots. Once again, the final edited production consists of trimmed down, interesting sequences of frames.
A large number of video editing packages are available for Personal Computer users. High-end products are available, these being intended for professional video editing users, and such products have high functionality and high complexity. Low-end packages are also available, these having limited functionality, but typically retaining considerable complexity, intended for video camera enthusiasts or even children. A common need of video Editors (the term “Editor” denoting a person performing the editing function), be they professionals or enthusiastic consumers, is to trim the length of video clips that they wish to include in any current editing project. High-end and low-end video editing products take differing approaches to this clip-trimming task, but both approaches have significant usability failings.
Low-end video editors (the term “editor” denoting a product or a device), such as Apple iMoviepropr typically provide clip-trimming facilities only in the edit-timeline or storyline through the use of some kind of time-unit marker referenced to the apparent length of a clip in the timeline. Alternately, a numerical start time, and either a clip duration measure or a stop time entered into a dialogue box in units of frames or seconds or similar is used. This user-interface facility does not allow actual concurrent viewing of the clip while trimming in and out points.
High-end video editors typically provide a trimming viewer that combines the ability to play or step a clip at the whim of a user, often using a “scrubber” or slider control, while also allowing setting of in and out trim points at desired locations of the clip. The trim points are often displayed and controlled in a summary bar which represents the original length, or duration of the clip, and the trim markers appearing in this summary bar represent proportional positions of the actual trim points set by the user relative to the original clip duration. The scrubber or slider control also represents a proportional position within the clip length, this time of the viewed frame or heard audio.
High-end video editors often provide a trimming window that is disconnected from the information held within the edit timeline. Thus, any clip already imported to a timeline must be dragged, or otherwise imported into the trimming window where it may be potentially modified by the user. It is the norm that such modifications have no effect on the edit timeline during or after performance of the modifications by the user, until the user explicitly exports the trimmed clip back into the timeline. In this event, the trimmed clip is understood by the timeline portion of the editing application not to be the same clip as was originally imported into the trimmer. This identification of two separate clips adds to the list of workflow and usability problems for a user, even if that user is an expert. Exemplary high-end applications include Apple's Final Cut Propropr, and Adobe Premierepropr.
The types of usability problems encountered by a user in the above context include the need to replace the original clip (ie., the clip prior to trimming) in the timeline with the newly trimmed clip. This forces the user to take extra steps to make the replacement. Furthermore, the user is unable to obtain any information from the timeline or the trimmer, regarding the effect of the trimming on the final edited result, as is represented by the timeline. That is, only the local effect of a trim is available to a user in this context, whereas the global effect of a trim is not available until the user actually commits the trimmed clip back into the timeline. This represents an absence of synchronism between the user's trimming action and the editor's currently held state for the project. Furthermore, the user cannot easily move to another clip within the edit timeline and trim that clip. This limitation impairs the undertaking of related trimming operations between clips and the appreciation of their overall effect on the current project in the timeline. In addition, the edit timeline often is represented as having an arbitrary length, due to a strong interest in providing a fixed resolution representation for every clip and/or frame within the timeline. This often causes a timeline's contents to scroll beyond the boundary of the available window and out of visibility. This is a limitation when multiple clips need to be trimmed that cannot all be visible within the timeline at the same time without scrolling. Furthermore, previewing of the resultant production, to view the results of any one or more trimming operations, is provided in a further, separate viewer window and is unconnected and unsynchronised with any current or recent trimming operation.
Further workflow and usability problems are encountered when automatic editing is employed to generate the edit timeline. Automatic editing has the ability to modify an EDL (often represented graphically by a timeline) based on a number of factors beyond the selected sequence of clips provided as its input. Some of these factors include (i) user metadata such as template selection, where a template contains a characteristic set of editing instructions or operations aimed at producing an expected them4e or genre result for the output EDL, and (ii) and various scene or content metadata such as user-set highlights, pan-and-zoom metadata, and so on. When a user trims an input clip to an auto-editor then their actions can result in significant changes to the output EDL because of the potential non-linear behaviour of the auto-editing template. For example, if the user trims a clip to a significantly short period, then it might be discarded by the auto-editor altogether. Or, if the user adds a highlight flag to a frame of the clip while in the trimmer (the highlight being a form of user metadata) then the auto-editor may trim the clip automatically around the position of the highlight. With current systems, the user has a poor and delayed appreciation of the effects of changes they might make within the trim window, in regard to the overall result of the auto-edit. This is a disadvantage in regard to workflow and usability for a user of an auto-editor.
A user wishing to add an animated message or sprite to a video clip must have access to a video or still-image compositing tool such as Apple Quicktimepropr. Typically such an operation or effect is performed by defining or declaring a sprite layer or animation layer within a streaming compositor, and providing a matte or transparency signal for the sprite to allow it to be overlayed on the desired video content.
Users are provided with sprite animation facilities by various current software applications such as Macromedia Flashpropr (often these applications utilise a streaming video compositor such as Apple Quicktimepropr). However, the application and motion definition for sprite animations is typically a very manual-intensive process, requiring per-frame sprite application (known as rotoscoping), a steady hand, and an appreciation of object dynamics in a video frame for accurate and pleasing placement of the sprite. Alternatively, very basic automated sprite application capability is provided by some software applications. Such capabilities include definition of a fixed spatial coordinate or a spatial path to which the sprite is “attached”, both of which have no continuous association or reference to a tracked feature to which the user might wish to relate the sprite.
The current consumer-level sprite application solutions understand nothing about the content of any video to which they might be applied. This content-sprite relationship must be provided entirely by the user's frame-by-frame observation of the video content or alternatively, must be completely ignored and some content-unrelated, typically pre-determined, animation track is provided instead.
Per-frame application of a sprite by a user typically involves specification of a spatial location for the sprite on a per-frame basis, with best results being provided where the user accounts for the position of one or more content objects within the frame to which she wishes to associate the sprite in some higher semantic context. Such operations suffer from human error, in which spatial placement can jitter or jump because of the difficulty in creating smooth animations from what is effectively stop-motion photography. The user is, in such cases, being asked to provide movement dynamics and thus requires trajectory-building skills of similar degree to those of animation experts. Even systems that provide auto-smoothing of a user's animation trajectory or that provide a range of predetermined and possibly adjustable trajectories, do not provide any assistance as to the correct placement of a sprite in any and every frame based on the location of the content-objects with which the user desires to associate the sprites. This lack of automatic connection of the sprite's trajectory with the desired associated, content object therefore requires the user to check and/or correct the sprite trajectory per-frame, or to accept an inaccurate animation trajectory.
It can be seen that the application of a sprite and the definition or declaration of its animation trajectory suffers from significant limitations.
It is thus apparent that when the user either wishes to perform trimming operations using current video editing applications, or wishes to incorporate sprite animation or feature-associated effects operations in current video composition or editing applications, the user must tolerate severe preparation, contrivance, cost, skill, workflow and usability limitations, and thus suffers reduced efficiency and accuracy as a result.