Conventionally, it is an ordinary practice on the contents production site of television broadcasting stations and other production companies to cut out part of an audio visual materials (to be referred to as clip hereinafter) obtained by shooting a scene by means of a video camera to prepare a new clip and produces a set of contents by linking a plurality of clips prepared at the same time (see, for example, Patent Document 1).
FIG. 104 of the accompanying drawings schematically illustrates an example of a GUI (Graphical User Interface) image that can be displayed on the display unit of an editing apparatus adapted to such editing operations (to be referred to as editing image hereinafter). As clearly shown in FIG. 104, editing image 2001 includes a clip synopsis display section 2002, a monitor section 2003, a story board section 2004, a timeline section 2005 and effect information display sections 2006A through 2006C.
The clip synopsis display section 2002 is designed to select a desired bin or file from the various bins and files registered in the editing apparatus and display a synopsis of the clips contained in the bin or file.
The operator of the apparatus can select a desired clip from the clips that are synoptically displayed in the clip synopsis display section 2002 and drag and drop it to the monitor section 2003 so as to have the leading image of the clip displayed in the monitor section 2003.
Then, the operator can start replaying the dragged and dropped clip and have the reproduced image displayed in the monitor section 2003 by clicking the start button in the group of buttons 2007 displayed in a lower part of the monitor section 2003 under this condition. Additionally, the operator can fast-forward or rewind the clip by clicking the corresponding one of the buttons. Furthermore, the operator can move left or right the scrub cursor 2008 displayed above the group of buttons 2007 to indicate the position of the image that is being currently displayed out of the entire clip by operating the mouse of the apparatus in order to have the monitor section 2003 display the image that corresponds to the position of the scrub cursor 2008.
In this way, the operator can search for the desired frame by operating the replay button in the group of buttons 2007 and/or the scrub cursor 2008, visually confirming the image reproduced and displayed in the monitor section 2003. Then, the operator can specify the starting point (to be referred to as in point hereinafter) and the ending point (to be referred to as out point hereinafter) of the video/audio part to be cut out from the clip by clicking respectively an in point button 2009IN and an out point button 2009OUT arranged in a lower part of the monitor section 2003, while having the image of the frame displayed in the monitor section 2003.
Thus, the operator can paste the video/audio part of the clip sandwiched between the in point and the out point that he or she specified on the story board section 2004 by means of a drag and drop operation. The operator arranges the clips to be used for the current editing operation in the story board section 2004 of the editing image 2001 in the above-described manner so that he or she can imagine the results of the editing operation with ease. Note that a thumbnail and detailed information of a representative image, which may be the leading image, of each of the clips that are pasted are also displayed in the story board section 2004.
Then, the operator sequentially drags and drops the clips pasted on the story board section 2004 and pastes them on respective video tracks 2010V in the timeline section 2005. At this time, a band 2012V having a length that corresponds to the material length of each of the pasted clips is displayed on the video track 2010V of the clip according to the time scale 2011 that is also shown in the timeline section 2005. If any of the clips contain sound, a band 2012A having a length equal to that of the corresponding band 2012V is displayed at the same position on the corresponding audio track 2010A according to the time scale 2011.
A band 2012V that is displayed on a video track 2010V of the timeline section 2005 with or without a band 2012A displayed on an audio track 2010A tells that the image of the clip that corresponds to the band 2012V is displayed with or without, whichever appropriate, the sound of the clip that corresponds to the band 2012A at the time shown on the time scale 2011 in an operation of outputting the edited images and sounds. Thus, with the above-described process, it is possible to prepare an editing list that sequentially specifies the images that are to be displayed as edited images and sounds that are to be output as edited sounds.
When preparing such an editing list and if the operator wants to execute a video special effect process at the time, for example, when the image of the first clip is switched to the image of the second clip, the operator pastes by a drag and drop operation an icon 13 that corresponds to the video special effect (to be referred to as effect icon hereinafter) out of the effects listed and displayed in an effect list display section 6C of the effect information display sections 2006A through 2006C on the position of a transition track 2010T of the timeline section 2005 same as the position for switching from the first clip to the second clip on the transition track 2010T of the timeline section 2005 according to the time scale 2011.
Then, as a result, it is possible to input a command to execute the video special effect that corresponds to the effect icon 13 pasted on the transition track 2010 at the position linking the image of the first clip and the image of the second clip in the to-be-edited images.    Patent Document 1: Jpn. Pat. Appln. Laid-Open Publication No. 2000-251451
In an editing operation using an editing image 2001 as described above, the operation of searching for a frame in order to specify an in point and an out point is an operation of reproducing the image of the selected clip at high speed by repeating a fast forward winding action and a rewinding action for a number of times or by a scrubbing action of moving left and right the scrub cursor 2008 by means of a mouse in order to detect the desired frame.
However, such an operation of searching for a desired frame (to be referred to an image searching operation whenever appropriate hereinafter), which is an operation of reproducing the image of the selected clip at high speed by repeating a fast forward winding action and a rewinding action for a number of times or by a scrubbing action of moving left and right the scrub cursor by means of a mouse as described above, is time consuming unless the operator is trained well and has an excellent skill in such operations. Additionally, there can be cases where the video/audio material to be handled is subjected to compression coding in the long GOP format conforming to the so-called MPEG (Motion Picture Expert Group) Standards, which is a format where each GOP (Group Of Pictures) has a plurality of frames for the purpose of raising the compression efficiency, or in an open GOP format where the video/audio material is compressed by using preceding and succeeding GOP data. Then, a plurality of frames and a GOP have to be processed for decoding in order to decode the frame to make it difficult to randomly reproduce an image at high speed. Thus, the displayed image will be poorly responsive to further make the image searching operation a difficult one.
Furthermore, a so-called cut editing operation of linking clips has hitherto been conducted by pasting clips to the video tracks 2010V and the audio tracks 2010A of the timeline section 2005 in the editing image 201 and the image before the in point and the image after the out point have been confirmed by subsequent replays and scrubs. Therefore, the above-described operation steps have been required for a cut editing operation to make the operation a cumbersome one.
Additionally, while the operator needs to recognize the images and the sounds to be edited in an editing operation, he or she is required to rely on the sound being output from a speaker, visually confirming the corresponding image or check the levels and the waveform of the sound being displayed on the corresponding audio track 2010A in the timeline section 2005 of the editing image 2001 (see, for example, the audio track 2010A of “audio 3” in the timeline section 2005 in FIG. 104). Thus, it has been difficult to do an editing operation, coordinating images and sounds.