The invention is related to a method for processing video and/or audio signals. In particular the invention is related to a method according to claim 1 for processing video and/or audio signals and to a system according to claim 8.
Live video productions such as TV productions are realized today using vision mixers. Vision mixers are commercially available e.g. from the companies Grass Valley, Sony, Snell & Wilcox, and Ross.
A vision mixer (also called video switcher, video mixer, production switcher or simply mixer) is a device used to select between different video input signals to generate a video output signal. Besides switching directly between two input signals the vision mixer can also generate different kinds of transitions. Direct switching means that frame N is from a first input signal and frame N+1 is from a second input signal. Transitions between two input signals include simple dissolves and various kinds of effect transitions. Most mixers are equipped with keyers and matte generators to perform keying operations and to generate background signals which are also called mattes.
The vision mixer also performs the routing and switching of audio signals accompanying the video signals. However, since the processing of video signals is more complex than the processing of audio signals the present patent application is focused on the video signal. It is to be understood that in the context of the present patent application the processing of the video signal also implies a corresponding processing of an accompanying audio signal. Only for the sake of better intelligibility of the description of the present invention audio signals are not always mentioned in addition to the video signals. In the following also the term “channel” will be used for video signals originating from a specific source.
In order to enable the multiple functionalities of vision mixers they consist of a huge amount of hardware components to process the video signals. The processing hardware components are located in one housing and are connected with local bus solutions in order to control all video processing hardware in real-time to meet the fast control requirements of live productions. In today's vision mixers there is a latency of approximately 40 ms between the moment when a user pushes a button until the associated function is executed. A latency of 40 ms is still called “real-time” processing.
The vision mixer comprises a central mixing electronic, several input channels and at least one output channel, a control unit and a user interface. Such kind of vision mixer is described for example in DE 103 36 214 A1.
The mixing electronic is provided with up to 100 or even more video input signals at the same time. The input signals are live video signals from cameras, recorded video clips from a server such as archived material, slow-motion clips from dedicated slow-motion servers, synthetic images, animations and alphanumeric symbols from graphic generators.
Devices external to the vision mixer are also controlled from the vision mixer by the user.
In the broadcast industries it is common use that video is transferred and ported in full video bandwidth at Serial Digital Interface (SDI) and also processed (mixed or modified) in full bandwidth and in real-time. That means that the video is processed frame by frame as it comes in periodically. The big advantage of this environment is that the processed video always shows the highest quality at all processed outputs with a minimum of signal and control latency.
Today, video distribution technology undergoes a change. Video is distributed more and more in data streams based on data networks based on conventional IT technologies instead of using SDI cable arrays. Data networks based on conventional IT technologies will be called in the following also briefly data networks. The term data network shall not include SDI cable arrays. Using IT technology for video streaming is standard for consumer products or other applications on the consumer level which are lower quality applications. Video streaming over data networks has already started to be introduced in broadcast like industries. However, today it is in general not possible to port the full bandwidth that is available in SDI broadcast studios to a data network. The bandwidth is not yet available and even the available insufficient bandwidth is very expensive to rent e.g. for local and wide area networks (LAN, WAN). Today the problem is solved by making compromises. The compromise includes distributing video channels in compressed quality and in addition to that by keeping the number of distributed channels to a minimum especially when high quality transfer is requested. Due to this bottleneck it is simply not realistic to provide a broadcast vision mixer with 50 to 100 uncompressed high definition (HD) video streams at the same time. This is the reason why first attempts to move broadcast applications to IT data networks is very limited at present. On a 10 Gbit network one can only transport at maximum 3 full HD 1080p50 video signals simultaneously. Alternatively, the video is transferred as a compressed signal at a compression factor up to 25 compromising the quality of the final processed video output. There is always a trade-off between one situation where all sources are transferred in a compressed format and another situation where only a few channels are transferred in an uncompressed format. The compromise between the before mentioned extreme situations is somewhere between consumer and broadcast video output quality depending on the application. The more sources are requested the higher is the selected compression to cope with bandwidth limitations. In an environment with limited bandwidth the number of channels that can be transferred decreases if the requested quality of the channels increases. Quality means in this context a higher data rate of the transferred channel.
In the following the term “high quality” and “high quality signal” will be used for uncompressed signals and signals compressed with a low compression factor such as e.g. 4. High quality is a relative property of a signal relative to other signals which are compressed with a high compression factor such as e.g. 25, 30 or even 50.
WO 2009/014716 A1 discloses a full duplex network-based system and a corresponding method. The known duplex communications system provides full duplex audio and video communications between a first location and a second location. At the first location there is a reporter and a camera person and at the second location a broadcast studio. The audio and video data are transmitted in a compressed format via a wireless network between the first and second location. Quality of service statistical information is used to enhance or optimize the quality of the signal being transmitted based on varying performance measurements.
WO 2005/122 025 A2 describes a personal media broadcasting system that enables video distribution over a computer network and allows a user to view and control media sources over a computer network from a remote location. The central component is a personal broadcaster which is connected by a local area network with local clients and by a remote network with remote clients on the one hand and with audio/video source devices on the other hand. The personal broadcaster compresses audio and video before converting it into network packets for transmission over the local network and the remote network. The known system optimizes the audio and video compression based on available network bandwidth and capabilities of client devices to cope with variable data throughput to local and remote clients.