The present invention relates to audio/video reproducing apparatus and methods of reproducing audio/video material.
The present invention also relates to video processing apparatus, audio processing apparatus and methods of processing video signals and audio signals.
The present invention also relates to editing systems for combining items of audio/video material to form audio/video productions. The present invention also relates to methods of generating audio/video productions.
Editing is a process in which items of audio/video material are combined to form an audio/video production. Generally audio/video material items are captured from a source in accordance with a predetermined plan. However, typically many audio/video material items are not used in the edited version of the audio/video production. For example, a television program, such as a high quality drama, may be formed from a combination of takes of audio/video material items from a single camera. As such, in order to form the program, several takes are combined in order to form a flow required by the story of the drama. Furthermore several takes may be generated for each scene but only a selected number of these takes are combined in order to form the scene.
The term audio and/or video will be referred to herein as audio/video and includes any form of information representing sound or visual images or a combination of sound and visual images.
In a post production process the items of audio/video material are selectively combined by the editor to form the audio/video production. However in order to select the required audio/video material items to form the production, the editor must review the items of audio/video material that have been generated. This is a time consuming and arduous task, particularly when a linear recording medium, such as a video tape has been used to record the audio/video material items.
In general the quality of the images represented on the recording medium, to the extent that the images and/or sound represent the original source is arranged to be as high as possible. This means that an amount of information that must be store to represent these images and/or sound is relatively high. As a result, the images and/or sound cannot be readily accessed so that the content of the audio/video material items cannot be easily ascertained once recorded. This is particularly so, if a format in which the images and sounds are represented is compressed in some way. For example video cameras and camcorders are arranged conventionally to record a video signals representing the moving images on a video tape. Once the video signals have been recorded on to the video tape, a user cannot determine the content of the video tape without reviewing the entire tape. Furthermore, because video tape is an example of a linear recording medium, the task of navigating through the media to locate particular content items of video material is time consuming and labour intensive. As a result during an editing process in which selected items from the contents of the video tape are combined in an order which may be different to that in which they were recorded, it may be necessary to review the entire contents of the video tape in order to identify the selected items.
According to the present invention there is provided an audio/video reproducing apparatus connectable to a communications network for selectively reproducing items of audio/video material from a recording medium in response to a request received via said communications network.
By providing an audio/video reproducing apparatus which is connectable to a communications network, an editing facility is provided for reproducing audio/video material items, in which the items may be remotely selected. A network connection provides a facility for the audio/video material items to be accessed separately by more than one editing terminal.
The content of video material generated by a camera is typically stored in a form which facilitates a high quality reproduction. In general the quality of the images represented by the video signal, to the extent that the images reflect an original image source falling within the field of view of the camera, is arranged to be as high as possible. This means that an amount of information that must be store to represent these images is relatively high. This in turn requires that the video signal is stored in a format that does not readily allow access to the content of the video signals. This is particularly so, if the video signal is compressed in some way. For example, video cameras and camcorders are arranged conventionally to record a video signals representing moving images on to a video tape. Once the video signals have been recorded on to the video tape, a user cannot readily determine the content of the video tape without reviewing the entire tape. Alternatively, the contents of the recording medium may be ingested to provide substantially non-linear access to the audio/video material. However this is time consuming, particularly for example for a linear recording medium. Therefore by providing a facility for accessing the audio/video material items via a network, the items may be selectively accessed via the network, without being ingested and without having to review the entire of the tape.
In preferred embodiments the audio/video reproducing apparatus may comprise a control processor which is arranged in use to receive data representing requests for audio/video material items via the communications network, and a reproducing processor coupled to the control processor and arranged in response to signals identifying the audio/video material items from the control processor to reproduce the audio/video material items, which are communicated via the communications network.
The task of navigating through the media to locate particular content items of video material is time consuming and labour intensive. As a result during an editing process in which selected items from the contents of the video tape are combined in an order which may be different to that in which they were recorded, it may be necessary to review the entire contents of the video tape in order to identify the selected items. Hence by identifying the audio/video material items required and reproducing only the items identified, an advantage is provided in respect of the time taken to edit an audio/video production.
In order to receive commands identifying the audio/video material items and to communicate the audio/video material items, the audio/video reproducing apparatus may comprise a first network interface connectable to a first communications network for receiving the data representing the requests for audio/video material, and a second network interface connectable to a second communications network for communicating the items of audio/video material. By providing a first network interface adapted to receive data representative of request for audio/video data and a second interface for communicating the items of audio/video material, the first and second interfaces can be optimised for the different type of data being communicated. For the audio/video material items this is particularly important because the network connection must stream audio/video which requires a relatively high bandwidth. As such in preferred embodiments, the first network interface may be arranged to operate in accordance with a data communications network standard such as Ethernet, RS 322 or RS 422 or the like. Furthermore, the second network interface may be arranged to operates in accordance with the Serial Digital Interface (SDI) or the Serial Digital Transport Interface (SDTI).
A particular advantage is provided by identifying the content of the audio/video material items so that appropriate items may be selected and ingested via the network. Meta data is data which serves to describe either the content of audio/video material or parameters present or used to generate the audio/video material or any other information associated with the audio/video material.
In preferred embodiments, the data representing requests for audio/video material items includes meta data indicative of the audio/video material items. The meta data may be at least one of UMID, tape ID and time codes, and a Unique Material Reference Number.
Although the reproducing apparatus may be arranged to reproduce items of audio/video material from a single recording medium, the reproducing processor may comprise a plurality of audio/video recording/reproducing apparatus each of which is coupled to said control processor via a local data bus. A further improvement is provided to the audio/video reproducing apparatus in accessing a plurality of recording media from the control process so that, for example the entire contents of a shoot from which the audio/video production is to be generated can be accessed via the network. Access may also be arranged in parallel. The recording media may also be different, so that some of the plurality of audio/video recording/reproducing apparatus may reproduce the audio/video items from tape and some from disc.
In order to access the audio/video material present on the recording media, in preferred embodiments, the local bus may include a control communications channel for communicating control data to and/or from the control processor, and a video data communications channel for communicating the items of audio/video material from the plurality of audio/video recording/reproducing apparatus to the communications network.
To provide an indication of the contents of the audio/video material, the audio/video reproducing apparatus may have a display device which is arranged in operation to display images representative of the audio/video material items present on the recording medium. Furthermore to facilitate access to the audio/video material items, the display device may be a touch screen coupled to the control processor, and arrange in use to receive touch commands from a user for selecting the items of audio/video material.
According to another aspect of the present invention there is provided a video processing apparatus for processing video signals representing images comprising an activity detector which is arranged in operation to receive the video signals and to generate an activity signal indicative of an amount of activity within the images represented by the video signal, and a meta data generator coupled to the activity detector which is arranged in operation to receive the video signal and the activity signal and to generate meta data representing the content of the video signals at temporal positions within the video signal, which temporal positions are determined from the activity signal.
In preferred embodiments the meta data generator is an image generator, the meta data generated being sample images at the temporal images within the video signal determined by the activity signal.
The present invention provides a particular advantage in providing an indication of the content of video signals, at temporal positions within those signals at which there is activity. As a result an improvement is provided to an editing or a process in which the video signals are being ingested for further processing, in providing an visual indication from the sample images of the content of the video signals at temporal positions within the video signals which may be of most interest to an editor or user.
The sample images can provide a static representation of the moving video images which facilitates navigation by providing a reference to the content of the moving video images.
The activity signal may be generated from generating a color histogram of the color components within an image and determining activity from a rate of change of the histogram, or from for example motion vectors for selected image components. The activity signal may be therefore representative of a relative amount of activity within the images represented by the video signal and the image detector may be arranged in operation to produce more of the sample images during periods of greater activity indicated by the activity signal. By arranging for more sample images to be generated a greater periods of activity, the information provided to an editor about the content of the video signals is increased, or alternatively the available resources for generating the sample images is concentrated on periods within the video signal of most interest.
In order to reduce an amount of data capacity required to store and/or communicate the sample images, the sample images may be represented by a substantially reduced amount of data in comparison to the images represented by the video signal.
Although the video processing apparatus may receive the video signals from an separate source, advantageously the video processing apparatus may further comprise a reproduction processor which is arranged in operation to receive a recording medium on which the video signals are recorded and to reproduce the video signals from the recording medium. Furthermore in preferred embodiments the image generator may be arranged in operation to generate, for each of the sample images a material identification representative of locations on the recording medium where the video signals corresponding to the sample images are recorded. This provides an advantage in not only providing a visual indication of the contents of a recording medium, but also providing with the visual indication a location at which this content is stored so that the video signals at this location can be reproduced for further editing.
According to another aspect of the present invention there is provided an audio processing apparatus for processing an audio signal representing sound, the apparatus comprising an activity detector which is arranged in operation to receive the audio signal and to generate an activity signal indicative of an amount of activity within the sound represented by the audio signal, and a meta data generator coupled to the activity detector which is arranged in operation to receive the audio signal and the activity signal and to generate meta data representing the content of the audio signals at temporal positions within the audio signal, which temporal positions are determined from the activity signal.
According to a further aspect of the present invention there is provided an audio processing apparatus for processing audio signals representative of sound, the audio processing apparatus comprising a speech analysis processor which is arranged in operation to generate speech data identifying speech detected within the audio signals, an activity processor coupled to the speech analysis processor and arranged in operation to generate an activity signal in response to the speech data, and a content information generator, coupled to the activity processor and the speech analysis processor and arranged in operation to generate data representing the content of the speech at temporal positions within the audio signal determined by the activity signal.
As for video signals, the present invention finds application in generating an indication of the content of speech present in audio signals, whereby navigation through the content of the audio signals is facilitated. For example, in preferred embodiments, the activity signal may indicative of the start of a speech sentence, so that the data representing the content of the speech provides an indication of the content of the start of each sentence.
The content data can provide a static structural indication of the content of the audio signals which can facilitate navigation through the audio signals by providing a reference to the content of those signals.
Although the audio processor may receive the audio signal from a separate source, in preferred embodiments, the reproduction processor may be arranged in operation to receive a recording medium on which the audio signals are recorded and to reproduce the audio signals from the recording medium. Furthermore, the content information generator may be arranged in operation to generate, for each of the content data items a material identification representative of a location on the recording medium where the audio signals corresponding to the content data are recorded. As such, an advantage is provided to an editor by associating a material identifier providing the location of the audio signals on the recording medium corresponding to the content data, with the content data which can be used to navigate through the recording medium. The content data may be any convenient representation of the content of the speech, however, in preferred embodiments the content data is representative of text corresponding to the content of the speech.
According to another aspect of the present invention there is provided a system for editing audio/video productions comprising an ingestion processor having means for receiving a recording medium and is arranged in use to reproduce audio/video material items from the recording medium, a data base operable to receive and to store meta data describing the contents of audio/video material items loaded into the ingestion processor, and an editing processor coupled to the ingestion processor and the data base, the editing processor having a graphical user interface for displaying a representation of the meta data stored in the data base and for selecting the audio/video material items from the displayed representation of the meta data, the editing processor being arranged to combine user selected items of audio/video material, which are selectively reproduced by the ingestion processor in response to meta data corresponding to the selected items of audio/video material being communicated to the ingestion processor by the editing processor.
As already explained, during acquisition, once the signals representing the audio/video material items have been recorded on to the recording medium, a user cannot readily determine the content of the audio/video material items without reproducing the items from the recording medium. Alternatively, the contents of the recording medium may be ingested to provide substantially non-linear access to the audio/video material. This is time consuming, particularly for example for a linear recording medium. However by providing access to meta data which may be generated at acquisition of the audio/video material, and which describes the content of the material, an editing system may select and only reproduce items of audio/video material from the recording medium which are required for the edited audio/video production. As such the editing process is made more efficient by only ingesting audio/video material items which are required for the audio/video production.
Advantageously, the editing processor may be coupled to the data base and to the ingestion processor via a data communications network. The communications network provides a facility for accessing the meta data and the audio/video material items remotely. Additionally, more than one editing processor may be coupled to the comminations network thereby providing a facility for the matadata in the data base and the audio/video material to be selectively accessed, whereby editing of more than one audio/video production may be edited contemporaneously.
In preferred embodiments, the data communications network may comprise a first communications network coupled to the editing station, the data base and the ingestion processor for communicating the meta data, and a second communications network coupled to the editing station, the data base and the ingestion processor for communicating the items of audio/video material. By providing a first communications channel adapted to receive data representative of requests for audio/video data and a second communications channel for communicating the items of audio/video material, the first and second interfaces can be optimised for the different type of data being communicated. For the audio/video material items this is advantageous because the network connection must stream audio/video which requires a relatively high bandwidth. As such in preferred embodiments, the first network interface may be arranged to operate in accordance with a data communications network standard such as Ethernet, RS 322 or RS 422 or the like. Furthermore, the second network interface may be arranged to operates in accordance with the Serial Digital Interface (SDI) or the Serial Digital Transport Interface (SDTI).
In preferred embodiments, the meta data may be one of a UMID, tape ID and time codes, and a Unique Material Reference Number, identifying the material items.
As mentioned above, the meta data may be generated with the audio/video material items during acquisition. As such, the recording medium may include the meta data describing the content of the audio/video material items recorded on to the recording medium, and the ingestion processor may be arranged in operation to reproduce the meta data and to communicate the meta data via the network to the data base, the data base operating to receive and to store the meta data.
A particular advantage is provided by identifying the content of the audio/video material items so that appropriate items may be selected and ingested via the network.
The term meta data as used herein refers to and includes any form of information or data which serves to describe either the content of audio/video material or parameters present or used to generate the audio/video material or any other information associated with the audio/video material. Meta data may be, for example, xe2x80x9csemantic meta dataxe2x80x9d which provides contextual/descriptive information about the actual content of the audio/video material. Examples of semantic meta data are the start of periods of dialogue, changes in a scene, introduction of new faces or face positions within a scene or any other items associated with the source content of the audio/video material. The meta data may also be syntactic meta data which is associated with items of equipment or parameters which were used whilst generating the audio/video material such as, for example, an amount of zoom applied to a camera lens, an aperture and shutter speed setting of the lens, and a time and date when the audio/video material was generated. Although meta data may be recorded with the audio/video material with which it is associated, either on separate parts of a recording medium or on common parts of a recording medium, meta data in the sense used herein is intended for use in navigating and identifying features and essence of the content of the audio/video material, and may, therefore be separated from the audio/video signals when the audio/video signals are reproduced. The meta data is therefore separable from the audio/video signals.
Various further aspects and features of the present invention are defined in the appended claims.